Introduction to R

Lecture 02: Data Wrangling in R

Student Name: Live HTML

Student ID: NA


0.1.0 About Introduction to R

Introduction to R is brought to you by the Centre for the Analysis of Genome Evolution & Function (CAGEF) bioinformatics training initiative. This course was developed based on feedback on the needs and interests of the Department of Cell & Systems Biology and the Department of Ecology and Evolutionary Biology.

The structure of this course is a code-along style; It is 100% hands on! A few hours prior to each lecture, links to the materials will be available for download at QUERCUS. The teaching materials will consist of an R Markdown Notebook with concepts, comments, instructions, and blank coding spaces that you will fill out with R by coding along with the instructor. Other teaching materials include a live-updating HTML version of the notebook, and datasets to import into R - when required. This learning approach will allow you to spend the time coding and not taking notes!

As we go along, there will be some in-class challenge questions for you to solve either individually or in cooperation with your peers. Post lecture assessments will also be available (see syllabus for grading scheme and percentages of the final mark) through DataCamp to help cement and/or extend what you learn each week.

0.1.1 Where is this course headed?

We’ll take a blank slate approach here to R and assume that you pretty much know nothing about programming. From the beginning of this course to the end, we want to take you from some potential scenarios such as…

  • A pile of data (like an excel file or tab-separated file) full of experimental observations that you don’t know what to do with it.

  • Maybe you’re manipulating large tables all in excel, making custom formulas and pivot tables with graphs. Now you have to repeat similar experiments and do the analysis again.

  • You’re generating high-throughput data and there aren’t any bioinformaticians around to help you sort it out.

  • You heard about R and what it could do for your data analysis but don’t know what that means or where to start.

and get you to a point where you can…

  • Format your data correctly for analysis.

  • Produce basic plots and perform exploratory analysis.

  • Make functions and scripts for re-analysing existing or new data sets.

  • Track your experiments in a digital notebook like R Markdown!

0.1.2 How do we get there? Step-by-step.

In the first lesson, we will talk about the basic data structures and objects in R, get cozy with the R Markdown Notebook environment, and learn how to get help when you are stuck because everyone gets stuck - a lot! Then you will learn how to get your data in and out of R, how to tidy our data (data wrangling), and then subset and merge data. After that, we will dig into the data and learn how to make basic plots for both exploratory data analysis and publication. We’ll follow that up with data cleaning and string manipulation; this is really the battleground of coding - getting your data into just the right format where you can analyse it more easily. We’ll then spend a lecture digging into the functions available for the statistical analysis of your data. Lastly, we will learn about control flow and how to write customized functions, which can really save you time and help scale up your analyses.

Don’t forget, the structure of the class is a code-along style: it is fully hands on. At the end of each lecture, the complete notes will be made available in a PDF format through the corresponding Quercus module so you don’t have to spend your attention on taking notes.


0.1.3 What kind of coding style will we learn?

There is no single path correct from A to B - although some paths may be more elegant, or more efficient than others. With that in mind, the emphasis in this lecture series will be on:

  1. Code simplicity - learn helpful functions that allow you to focus on understanding the basic tenets of good data wrangling (reformatting) to facilitate quick exploratory data analysis and visualization.
  2. Code readability - format and comment your code for yourself and others so that even those with minimal experience in R will be able to quickly grasp the overall steps in your code.
  3. Code stability - while the core R code is relatively stable, behaviours of functions can still change with updates. There are well-developed packages we’ll focus on for our analyses. Namely, we’ll become more familiar with the tidyverse series of packages. This resource is well-maintained by a large community of developers. While not always the “fastest” approach, this additional layer can help ensure your code still runs (somewhat) smoothly later down the road.

0.2.0 Class Objectives

This is the second in a series of seven lectures. Last lecture we discussed the basic functions and structures of R as well as how to navigate them. This week we will focus more on the data.frame object and learning how to manipulate the information it holds.

At the end of this session you will be familiar with importing data from plain text and excel files; filtering, sorting, and re-arranging your data.frames using the dplyr package; the concept of piping command calls; and writing your resulting data to files. Our topics are broken into:

  1. Install and load packages for R
  2. Import data into R (tsv, csv, xlsx).
  3. Data inspection with the base R functions.
  4. Use the dplyr package to filter, subset and manipulate your data and to perform simple calculations.
  5. Exporting your data after manipulating it.


0.3.0 A legend for text format in R Markdown

  • Grey background: Command-line code, R library and function names. Backticks are also use for in-line code.
  • Italics or Bold italics: Emphasis for important ideas and concepts
  • Bold: Headers and subheaders
  • Blue text: Named or unnamed hyperlinks
  • ... fill in the code here if you are coding along

Blue box: A key concept that is being introduced

Yellow box: Risk or caution

Green boxes: Recommended reads and resources to learn R

Red boxes: A comprehension question which may or may not involve a coding cell. You usually find these at the end of a section.


0.4.0 Lecture and data files used in this course

0.4.1 Weekly Lecture and skeleton files

Each week, new lesson files will appear within your RStudio folders. We are pulling from a GitHub repository using this Repository git-pull link. Simply click on the link and it will take you to the University of Toronto datatools Hub. You will need to use your UTORid credentials to complete the login process. From there you will find each week’s lecture files in the directory /2025-09-IntroR/Lecture_XX. You will find a partially coded skeleton.Rmd file as well as all of the data files necessary to run the week’s lecture.

Alternatively, you can download the R-Markdown Notebook (.Rmd) and data files from the RStudio server to your personal computer if you would like to run independently of the Toronto tools.

0.4.2 Live-coding HTML page

A live lecture version will be available at camok.github.io that will update as the lecture progresses. Be sure to refresh to take a look if you get lost!

0.4.3 Post-lecture PDFs and Recordings

As mentioned above, at the end of each lecture there will be a completed version of the lecture code released as an HTML file under the Modules section of Quercus.


0.4.4 Data Set Description

The following datasets used in this week’s class come from a published manuscript on PLoS Pathogens entitled “High-throughput phenotyping of infection by diverse microsporidia species reveals a wild C. elegans strain with opposing resistance and susceptibility traits” by Mok et al., 2023. These datasets focus on the an analysis of infection in wild isolate strains of the nematode C. elegans by environmental pathogens known as microsporidia. The authors collected embryo counts from individual animals in the population after population-wide infection by microsporidia and we’ll spend our next few classes working with the dataset to learn how to format and manipulate it.

0.4.4.1 Dataset 1: /data/infection_meta.csv

This is a comma-separated version of the metadata data from our measurements. This dataset tracks information for each experimental condition measured including experimental dates, reagent versions, and sample locations. We’ll use this file to ease our way into importing, manipulating, and exporting in today’s class.

0.4.4.2 Dataset 2: /data/infection_data_all.xlsx

This is a series of amalgamated datasets that we will use to show how we can import even entire Excel books into R. This file contains two sheets containing experimental measurements as well as the experimental metadata from Dataset 1.


1.0.0 Installing and importing packages

Packages are groups of related functions that serve a purpose. They can be a series of functions to help analyse specific data or they could be a group of functions used to simplify the process of formatting your data (more on that later in this lecture!).

Depending on their structure they may also rely on other packages.

1.1.0 Locating packages

There are a few different places you can install packages from R. Listed in order of decreasing trustworthiness:

CRAN (The Comprehensive R Archive Network)

  • Guidelines for submission, reviewed. Where the majority of packages are.

Bioconductor (Bioinformatics/Genomics focus)

  • Guidelines for submission, reviewed, and must have a vignette.

GitHub

  • No formal review process, but peers can opens issues to highlight problems or suggest fixes.
  • There is an increasing number of publication-related packages.
  • Check to see the last time updates or comments were made to see if it is maintained by the developer.

Joe’s website

  • No review process. Not sure I trust that guy.

Regardless of where you download a package from, it’s a good idea to document its installation, especially if you had to troubleshoot the installation (you’ll eventually be there, I promise!)

devtools is a package that is used for developers to make R packages, but it also helps us to install packages from GitHub. It is downloaded from CRAN.


1.2.0 Installing packages for your RStudio (on JupyterHub)

Installing packages through your RStudio instance is relatively straightforward but any packages you install only remain during your current instance (login) of the hub. Whenever you logout from the JupyterHub (or datatools.utoronto.ca), these installed libraries will essentially vaporize.

The install.packages() command will work just as it should in a desktop version of RStudio.

# Always keep installation commands commented out
install.packages('devtools') 
## Warning: package 'devtools' is in use and will not be installed

1.2.1 Will it or won’t it install? Check for dependencies!

R may give you package installation warnings. Don’t panic. In general, your package will either be installed and R will test if the installed package can be loaded, or R will give you a non-zero exit status - which means your package was not installed. If you read the entire error message, it will give you a hint as to why the package did not install.

Some packages depend on previously developed packages and can only be installed after another package is installed in your library. Similarly, that previous package may depend on another package and so on. To solve this potential issue we use the dependencies logical parameter in our call.

install.packages('devtools', dependencies = TRUE)
## Warning: package 'devtools' is in use and will not be installed
# remove.packages("devtools") # Uninstall any CRAN package

1.2.2 Use library() to load your packages after installation

A package only has to be installed once. It is now in your library. To use a package, you must load the package into memory. Unless this is one of the packages R loads automatically, you choose which packages to load every session.

Installing libraries on datatools.utoronto.ca: Unlike on a personal installation of RStudio, we are running through an RStudio server which creates a fresh “instance” of an RStudio installation each time you log in. Some packages are pre-installed by system administrators but any packages outside of these essential ones, will need to be installed every time you restart your RStudio instance. Keep that in mind!

library() Takes a single argument. library() will throw an error if you try to load a package that is not already installed. You may see require() on help pages, which also loads packages. It is usually used inside functions (it gives a warning instead of an error if a package is not installed).

Errors versus warnings: So far we’ve seen that errors will stop code from running. Warnings allow code to run until an error is reached. An eventual error may not be the result of a warning but it certainly leaves your code vulnerable to errors down the road.

# When we try to load this we will likely receive an error due to an older package being loaded
# Restart the kernel! It will keep the installed libraries but will unload the offending package.
library(devtools) 

# or 

#library('devtools')

1.3.0 Loading packages from Bioconductor requires BiocManager()

To install from Bioconductor you can use the package BiocManager() to help pull down and install other packages from the Bioconductor repository.

if (!requireNamespace("BiocManager", quietly = TRUE)) 
    install.packages("BiocManager") # this piece of code checks if BiocManager is installed. 
# If is not installed, it will do it for you. It does nothing if BiocManager is already installed.

# If you run this, it could take a while
BiocManager::install("GenomicRanges")
## 'getOption("repos")' replaces Bioconductor standard repositories, see
## 'help("repositories", package = "BiocManager")' for details.
## Replacement repositories:
##     CRAN: https://cran.r-project.org
## Bioconductor version 3.12 (BiocManager 1.30.22), R 4.0.5 (2021-03-31)
## Warning: package(s) not installed when version(s) same as or greater than current; use
##   `force = TRUE` to re-install: 'GenomicRanges'
## Installation paths not writeable, unable to update packages
##   path: C:/Program Files/R/R-4.0.5/library
##   packages:
##     boot, class, cluster, codetools, crayon, evaluate, foreign, KernSmooth,
##     lattice, mgcv, nlme, nnet, pbdZMQ, rpart, spatial, survival
## Old packages: 'abind', 'ade4', 'ape', 'backports', 'basefun', 'bdsmatrix',
##   'bench', 'BiocManager', 'bit', 'bit64', 'bitops', 'brew', 'brio', 'bslib',
##   'cachem', 'Cairo', 'callr', 'car', 'classInt', 'cli', 'clue', 'commonmark',
##   'corrplot', 'covr', 'cowplot', 'coxme', 'credentials', 'crosstalk', 'curl',
##   'data.table', 'DBI', 'dbplyr', 'dendextend', 'DEoptimR', 'desc', 'diffobj',
##   'digest', 'directlabels', 'downlit', 'dplyr', 'dreamerr', 'DT', 'e1071',
##   'fansi', 'fastmap', 'fixest', 'foghorn', 'fontawesome', 'fs', 'future',
##   'gapminder', 'gee', 'generics', 'geomtextpath', 'gert', 'ggExtra', 'ggforce',
##   'gghighlight', 'ggnewscale', 'ggpubr', 'ggsci', 'git2r', 'glmmTMB', 'glmnet',
##   'globals', 'glue', 'haven', 'highr', 'htmltools', 'htmlwidgets', 'httpuv',
##   'hunspell', 'ISwR', 'jpeg', 'jsonlite', 'knitr', 'Lahman', 'later', 'leaps',
##   'lintr', 'listenv', 'lme4', 'lubridate', 'maps', 'markdown', 'MatrixModels',
##   'matrixStats', 'mclust', 'mice', 'microbenchmark', 'mime', 'miniUI', 'minqa',
##   'mlt', 'mockery', 'mockr', 'modeltools', 'mratios', 'multcomp', 'mvtnorm',
##   'networkD3', 'nloptr', 'odbc', 'ordinal', 'packrat', 'pander', 'parallelly',
##   'parsedate', 'patchwork', 'pingr', 'pixmap', 'pkgbuild', 'pkgdown',
##   'pkgload', 'plyr', 'polyclip', 'prettyunits', 'processx', 'profmem',
##   'profvis', 'progress', 'promises', 'ps', 'quantreg', 'R.oo', 'R.utils',
##   'ragg', 'Rcpp', 'RcppArmadillo', 'RcppEigen', 'RcppTOML', 'RCurl', 'readr',
##   'readxl', 'remotes', 'renv', 'repr', 'reprex', 'reticulate', 'rhub', 'rjson',
##   'rlang', 'RMariaDB', 'rmarkdown', 'RMySQL', 'robustbase', 'roxygen2',
##   'RPostgres', 'RPostgreSQL', 'rprojroot', 'rsconnect', 'RSpectra', 'RSQLite',
##   'rstudioapi', 'rzmq', 's2', 'sandwich', 'sass', 'sessioninfo', 'sf', 'shiny',
##   'shinydashboard', 'showtext', 'SimComp', 'sp', 'SparseM', 'spatstat.data',
##   'spatstat.geom', 'spatstat.random', 'spatstat.utils', 'spelling', 'splancs',
##   'stringi', 'stringr', 'survPresmooth', 'sysfonts', 'systemfonts', 'testthat',
##   'textshaping', 'TH.data', 'tibble', 'tidyr', 'timechange', 'tinytex', 'TMB',
##   'tram', 'tzdb', 'ucminf', 'units', 'utf8', 'uuid', 'V8', 'variables',
##   'vctrs', 'vdiffr', 'vipor', 'viridis', 'vroom', 'waldo', 'webutils', 'wk',
##   'writexl', 'xfun', 'xml2', 'xopen', 'xts', 'yaml', 'zip', 'zoo'
#or 

#BiocManager::install(c("GenomicRanges", "ConnectivityMap"))

1.4.0 Skip loading a library with package::function()

As mentioned above in section 1.1.0, devtools is required to install from GitHub. We don’t actually need to load the entire library for devtools if we are only going to use one function. We can select a function using this syntax package::function().

Directly accessing functions Sometimes we load libraries that can contain the same function names! While these functions may behave completely differently, how does the R interpreter know which one we are referring to? By default, R will use the most recent version of a function loaded into memory. By using the package::function() syntax, we can let R know exactly which version of “conflicting” functions we wish to use!

devtools::install_github("tidyverse/googlesheets4")
## Using github PAT from envvar GITHUB_PAT
## Skipping install of 'googlesheets4' from a github remote, the SHA1 (55cd9fdb) has not changed since last install.
##   Use `force = TRUE` to force installation

All packages are loaded the same regardless of their origin, using library().

# Load googlesheets4 now from the library
library(googlesheets4)

1.5.0 Packages used in this lecture

The following packages are used in this lecture:

  • tidyverse (tidyverse installs several packages for you, like dplyr, readr, readxl, tibble, and ggplot2)
  • writexl used for writing multiple datasets to excel files
#--------- Install packages to for today's session ----------#
#install.packages("tidyverse", dependencies = TRUE) # This package should already be installed on Jupyter Hub

# This package should NOT already be installed on the RStudio server
if(!require("writexl")) install.packages("writexl", dependencies = TRUE)

#--------- Load packages to for today's session ----------#
library(tidyverse)

# readxl, used for reading xlsx files, is installed with tidyverse but is not a core component when loading tidyverse
library(readxl) 
library(writexl)

2.0.0 Reading files in R


The most important thing when starting to work with your data is to know how to load it into the memory of the R kernel. There are a number of ways to read in files and each is suited to dealing with specific file types, file sizes or may perform better depending on how you wish to read/store the file (all at once, or a line at a time, or somewhere in between!

There are many file formats you may come across in your journey but the most common will be CSV (comma-separated values), TSV (tab-separated values), FASTQ (usually used for storing biological sequences), or some archived (ZIP, GZ, TGZ) version of these. R is even able to open these archived versions in their native format! We may interchangeably use the word parsing to describe the action of reading/importing formatted data files.


2.1.0 Import data to a tibble with read_csv()

The tidyverse package has its own function for reading in text files because the tibble structure was first developed as part of the dplyr package! We’ll spend some time learning more about the differences between the tibble and data.frame objects in section 2.3.2. Since we’ll be spending our time working with the tidyverse, then we may as well use their commands for importing files! If you want to learn how to do this with the base R utils package, check out the Appendix section for details.

Let’s look quickly at the read_csv() function which is a specific version of the read_delim() function from the readr package. The parameters we are interested in are:

  • file: The path to the file you want to import
  • col_names: TRUE (there is a header), FALSE (import without column names), or supply a character vector of custom names you want to use for your data columns.
  • col_types: NULL (default) and decides on column types itself, or a cols() specification of the data type for each column. Find more information in the ?read_csv details.
  • na: a character vector of strings to interpret as NA values. Very handy when you have values you want to identify and convert at import.

From this point on, we’ll pretty much use the terms tibble and data.frame interchangeably.

# ?read_csv

# Import our infection_meta.csv file from the data folder
infection_meta.tbl <- read_csv(file = "data/infection_meta.csv", 
                               col_names = TRUE, 
                               col_types = cols()  
                               # Producing a blank cols() specification suppresses any read_csv() output
                              )

# Check out the structure of our table
str(infection_meta.tbl)
## spc_tbl_ [276 x 29] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ experiment       : chr [1:276] "190426_VC20019_LUAm1_0M_72hpi" "190426_VC20019_LUAm1_10M_72hpi" "190426_VC20019_LUAm1_20M_72hpi" "190426_N2_LUAm1_0M_72hpi" ...
##  $ experimenter     : chr [1:276] "CM" "CM" "CM" "CM" ...
##  $ description      : chr [1:276] "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" ...
##  $ Infection Date   : num [1:276] 190423 190423 190423 190423 190423 ...
##  $ Plate Number     : num [1:276] 1 2 3 4 5 6 7 8 9 10 ...
##  $ Worm_strain      : chr [1:276] "VC20019" "VC20019" "VC20019" "N2" ...
##  $ Total Worms      : num [1:276] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
##  $ Spore Strain     : chr [1:276] "LUAm1" "LUAm1" "LUAm1" "LUAm1" ...
##  $ Spore Lot        : chr [1:276] "2A" "2A" "2A" "2A" ...
##  $ Lot concentration: num [1:276] 176000 176000 176000 176000 176000 176000 176000 176000 176000 176000 ...
##  $ Total Spores (M) : num [1:276] 0 10 20 0 10 20 0 10 20 0 ...
##  $ Total ul spore   : num [1:276] 0 56.8 113.6 0 56.8 ...
##  $ Infection Round  : num [1:276] 1 1 1 1 1 1 1 1 1 1 ...
##  $ 40X OP50 (mL)    : num [1:276] 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 ...
##  $ Plate Size       : num [1:276] 6 6 6 6 6 6 6 6 6 6 ...
##  $ Spores(M)/cm2    : num [1:276] 0 0.354 0.708 0 0.354 ...
##  $ Time plated      : num [1:276] 1300 1300 1300 1300 1300 1300 1300 1300 1300 1300 ...
##  $ Time Incubated   : num [1:276] 1600 1600 1600 1600 1600 1600 1600 1600 1600 1600 ...
##  $ Temp             : num [1:276] 21 21 21 21 21 21 21 21 21 21 ...
##  $ timepoint        : chr [1:276] "72" "72" "72" "72" ...
##  $ infection.type   : chr [1:276] "continuous" "continuous" "continuous" "continuous" ...
##  $ Fixing Date      : num [1:276] 190426 190426 190426 190426 190426 ...
##  $ Location         : chr [1:276] "Sample exhausted" "Sample exhausted" "Sample exhausted" "Sample exhausted" ...
##  $ Staining Date    : num [1:276] 190513 190513 190513 190430 190513 ...
##  $ Stain type       : chr [1:276] "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "DY96" ...
##  $ Slide date       : num [1:276] 190515 190515 190515 190501 190515 ...
##  $ Slide number     : num [1:276] 1 2 3 4 5 6 7 8 9 10 ...
##  $ Slide Box        : num [1:276] 2 2 2 2 2 2 2 2 2 2 ...
##  $ Imaging Date     : num [1:276] 190516 190516 190516 190502 190516 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   experiment = col_character(),
##   ..   experimenter = col_character(),
##   ..   description = col_character(),
##   ..   `Infection Date` = col_double(),
##   ..   `Plate Number` = col_double(),
##   ..   Worm_strain = col_character(),
##   ..   `Total Worms` = col_double(),
##   ..   `Spore Strain` = col_character(),
##   ..   `Spore Lot` = col_character(),
##   ..   `Lot concentration` = col_double(),
##   ..   `Total Spores (M)` = col_double(),
##   ..   `Total ul spore` = col_double(),
##   ..   `Infection Round` = col_double(),
##   ..   `40X OP50 (mL)` = col_double(),
##   ..   `Plate Size` = col_double(),
##   ..   `Spores(M)/cm2` = col_double(),
##   ..   `Time plated` = col_double(),
##   ..   `Time Incubated` = col_double(),
##   ..   Temp = col_double(),
##   ..   timepoint = col_character(),
##   ..   infection.type = col_character(),
##   ..   `Fixing Date` = col_double(),
##   ..   Location = col_character(),
##   ..   `Staining Date` = col_double(),
##   ..   `Stain type` = col_character(),
##   ..   `Slide date` = col_double(),
##   ..   `Slide number` = col_double(),
##   ..   `Slide Box` = col_double(),
##   ..   `Imaging Date` = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

As you can see, it’s a pretty smooth process to parse simple text files. We’ve imported our CSV file and can see it has 276 rows (observations) and 29 columns (variables). In later sections we’ll learn some additional functions for manipulating this data object as we become familiar with the tidyverse package.


2.2.0 Read excel spreadsheets with readxl package

What happens if we have an excel file? The readxl() package, which is installed as part of the tidyverse package, will recognize both xls and xlsx files. It expects tabular data, which is what these file types hold.

Note that back in section 1.5.0, we loaded the tidyverse package and explicitly load readxl so we can use the read_excel() function to accomplish our task. Some parameters we are interested in are:

  • path: The path to the file you want to import.
  • sheet: The sheet you want to read either as a string (ie “sheet name”) or integer (position).
  • col_names: TRUE (there is a header), FALSE (import as is), or supply a character vector of custom names you want to use for your data columns.
  • col_types: NULL (default) and decides on column types itself, or a character vector containing the column types listed as “blank”, “numeric”, “date”, or “text”.
  • na: a character vector of strings to interpret as NA values. Very handy when you have values you want to identify and convert at import.
  • range: a way to specify a rectangular area to take data from your excel file.

First, let’s try to open our excel file with read_csv().

# read_csv() doesn't work for excel files
head(read_csv("data/infection_data_all.xlsx"))
## Multiple files in zip: reading '[Content_Types].xml'
## Rows: 1 Columns: 1
## -- Column specification -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
## Delimiter: ","
## chr (1): <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 1 x 1
##   `<?xml version="1.0" encoding="UTF-8" standalone="yes"?>`                     
##   <chr>                                                                         
## 1 "<Types xmlns=\"http://schemas.openxmlformats.org/package/2006/content-types\~

Looks like it didn’t work… There is a lot of file metadata that exists with the actual data. If you could open this as a regular text file you would see all that extra information as we see some of it now. Therefore the .xlsx file cannot be imported correctly with this function.

Now let’s try read_excel().

# The readxl package is not a core component of the tidyverse so we need to load it
require(readxl) # Note that we've already loaded it in section 1.5.0

# let's take a peek at what happens when we import without any extra arguments
head(read_excel("data/infection_data_all.xlsx"))
## # A tibble: 6 x 29
##   experiment            experimenter description `Infection Date` `Plate Number`
##   <chr>                 <chr>        <chr>                  <dbl>          <dbl>
## 1 190426_VC20019_LUAm1~ CM           Wild isola~           190423              1
## 2 190426_VC20019_LUAm1~ CM           Wild isola~           190423              2
## 3 190426_VC20019_LUAm1~ CM           Wild isola~           190423              3
## 4 190426_N2_LUAm1_0M_7~ CM           Wild isola~           190423              4
## 5 190426_N2_LUAm1_10M_~ CM           Wild isola~           190423              5
## 6 190426_N2_LUAm1_20M_~ CM           Wild isola~           190423              6
## # i 24 more variables: Worm_strain <chr>, `Total Worms` <dbl>,
## #   `Spore Strain` <chr>, `Spore Lot` <chr>, `Lot concentration` <dbl>,
## #   `Total Spores (M)` <dbl>, `Total ul spore` <dbl>, `Infection Round` <dbl>,
## #   `40X OP50 (mL)` <dbl>, `Plate Size` <dbl>, `Spores/cm2` <dbl>,
## #   `Time plated` <chr>, `Time Incubated` <chr>, Temp <dbl>, timepoint <chr>,
## #   infection.type <chr>, `Fixing Date` <dbl>, Location <chr>,
## #   `Staining Date` <dbl>, `Stain type` <chr>, `Slide date` <dbl>, ...

2.2.1 Retrieve excel sheet names with excel_sheets()

Why doesn’t our output look like a workbook with multiple sheets? The read_excel() function defaults to reading in the first worksheet. You can specify which sheet you want to read in by position or name with the sheet parameter.

How will you know what the sheet names are for your workbook? You can see the name of your sheets using the excel_sheets() function which returns a character vector of names as output.

# grab the excel sheet names 
excel_sheets("data/infection_data_all.xlsx")
## [1] "infection_metadata" "embryo_data_wide"   "microsporidia_info"

2.2.2 Subset sheet and range within the read_excel() function

If we want to get fancy, it is possible to subset from a sheet by specifying cell numbers or ranges. Here we are grabbing sheet 1 (infection_metadata), and subsetting cells over a range defined by two cells - A3:D9.

For our purposes, the read_excel() function takes the default form of read_excel(path, sheet = NULL, range = NULL) but there are additional parameters we can supply to the function. See ?read_excel for more information.

# read in a specific sheet and range with read_excel()
read_excel(path = "data/infection_data_all.xlsx", 
           sheet = 1, 
           range = "A3:D9", )
## # A tibble: 6 x 4
##   `190426_VC20019_LUAm1_10M_72hpi` CM    `Wild isolate phenoMIP retest` `190423`
##   <chr>                            <chr> <chr>                             <dbl>
## 1 190426_VC20019_LUAm1_20M_72hpi   CM    Wild isolate phenoMIP retest     190423
## 2 190426_N2_LUAm1_0M_72hpi         CM    Wild isolate phenoMIP retest     190423
## 3 190426_N2_LUAm1_10M_72hpi        CM    Wild isolate phenoMIP retest     190423
## 4 190426_N2_LUAm1_20M_72hpi        CM    Wild isolate phenoMIP retest     190423
## 5 190426_AB1_LUAm1_0M_72hpi        CM    Wild isolate phenoMIP retest     190423
## 6 190426_AB1_LUAm1_10M_72hpi       CM    Wild isolate phenoMIP retest     190423

Caution: Note from our above example that we no longer have proper column headings! Rather the column names have been derived from the data existing in row A3. Normally, if you had your column names in the first row, but wanted to jump to a specific row for importing the data, you might include the skip parameter. If you had a complex header of metadata where your true table begins at a later point, then the range parameter is more appropriate. If you simply wanted a subset of the data, you might be better off importing most of what you want and subsetting it from the dataframe after the columns are named. There are many additional ways to subset your data but it really depends on the level of complexity you wish to achieve with your subsetting. Always try to choose the path of least resistance.

We could alternatively specify the sheet by name. Here we will also look at how you would simply grab specific rows of data using the cell_rows() helper function.

That’s right we can supply a function’s output as an argument to a parameter!

# read in an excel files by a specific row range

read_excel("data/infection_data_all.xlsx", 
           sheet = "infection_metadata", 
           range = cell_rows(1:9))
## # A tibble: 8 x 29
##   experiment            experimenter description `Infection Date` `Plate Number`
##   <chr>                 <chr>        <chr>                  <dbl>          <dbl>
## 1 190426_VC20019_LUAm1~ CM           Wild isola~           190423              1
## 2 190426_VC20019_LUAm1~ CM           Wild isola~           190423              2
## 3 190426_VC20019_LUAm1~ CM           Wild isola~           190423              3
## 4 190426_N2_LUAm1_0M_7~ CM           Wild isola~           190423              4
## 5 190426_N2_LUAm1_10M_~ CM           Wild isola~           190423              5
## 6 190426_N2_LUAm1_20M_~ CM           Wild isola~           190423              6
## 7 190426_AB1_LUAm1_0M_~ CM           Wild isola~           190423              7
## 8 190426_AB1_LUAm1_10M~ CM           Wild isola~           190423              8
## # i 24 more variables: Worm_strain <chr>, `Total Worms` <dbl>,
## #   `Spore Strain` <chr>, `Spore Lot` <chr>, `Lot concentration` <dbl>,
## #   `Total Spores (M)` <dbl>, `Total ul spore` <dbl>, `Infection Round` <dbl>,
## #   `40X OP50 (mL)` <dbl>, `Plate Size` <dbl>, `Spores/cm2` <dbl>,
## #   `Time plated` <dbl>, `Time Incubated` <dbl>, Temp <dbl>, timepoint <chr>,
## #   infection.type <chr>, `Fixing Date` <dbl>, Location <chr>,
## #   `Staining Date` <dbl>, `Stain type` <chr>, `Slide date` <dbl>, ...

Note that if your first row is the header, excluding this row will result in data filling in the header unless you include the parameter col_names = FALSE.

Likewise, how you would subset just columns from the same sheet? We can use the cell_cols() helper function for that.

# read in an excel files by a specific column range

head(read_excel(path = "data/infection_data_all.xlsx", 
                sheet = "infection_metadata", 
                range = cell_cols("B:D")))
## # A tibble: 6 x 3
##   experimenter description                  `Infection Date`
##   <chr>        <chr>                                   <dbl>
## 1 CM           Wild isolate phenoMIP retest           190423
## 2 CM           Wild isolate phenoMIP retest           190423
## 3 CM           Wild isolate phenoMIP retest           190423
## 4 CM           Wild isolate phenoMIP retest           190423
## 5 CM           Wild isolate phenoMIP retest           190423
## 6 CM           Wild isolate phenoMIP retest           190423

Using the range parameter: to learn more about the range parameter and using it with a series of helper functions, you can visit the readxl section on the tidyverse page.


2.3.0 lapply() is the list version of apply()

How would we read in all of the sheets at once? In one solution you can also use lapply(), a version of the apply() function that we learned about in Lecture 01 (section 4.3.0), to read in all sheets at once. lapply() uses as input the vector or list X and returns a list object of the same length as X. Each element of the returned list is the result of applying FUN to the corresponding element of X. Note that the elements of the returned list could be any kind of object!

For our examples, we can use lapply() so that each sheet from an xlsx file will be stored as a tibble inside of a list object. Recall that apply() took in a matrix-like object, a row/column specification (MARGIN), and a function (FUN).

lapply(), instead, drops the MARGIN parameter and takes in a vector or a list as the input. Remember that lists are a single dimension and thus do not have a row/column configuration. Basic parameters we require are:

  • X: A vector or list object
  • FUN: The function you wish to apply to each element of X.
  • ...: An unspecified number of additional parameters that are passed on to FUN as arguments for its parameters.

So far we have been accustomed to functions finding our variables globally (in the global environment), lapply() is looking locally (within the function) and so we need to explicitly provide our path. We will get more into local vs. global variables in our control flow lesson (lecture 07). For now, just know we can read in all worksheets from an excel workbook.

#?lapply

# Use lapply and provide a list of excel sheet names, then apply a function to each element (Sheet name) of the list!
excel_sheets_list <- lapply(X = excel_sheets("data/infection_data_all.xlsx"), # this will set X to a character vector
                            FUN = read_excel, # Note the lack of parentheses!
                            path = "data/infection_data_all.xlsx" # This is an argument for read_excel()
                           ) 

# What is the structure of our sheets_list?
str(excel_sheets_list)
## List of 3
##  $ : tibble [276 x 29] (S3: tbl_df/tbl/data.frame)
##   ..$ experiment       : chr [1:276] "190426_VC20019_LUAm1_0M_72hpi" "190426_VC20019_LUAm1_10M_72hpi" "190426_VC20019_LUAm1_20M_72hpi" "190426_N2_LUAm1_0M_72hpi" ...
##   ..$ experimenter     : chr [1:276] "CM" "CM" "CM" "CM" ...
##   ..$ description      : chr [1:276] "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" ...
##   ..$ Infection Date   : num [1:276] 190423 190423 190423 190423 190423 ...
##   ..$ Plate Number     : num [1:276] 1 2 3 4 5 6 7 8 9 10 ...
##   ..$ Worm_strain      : chr [1:276] "VC20019" "VC20019" "VC20019" "N2" ...
##   ..$ Total Worms      : num [1:276] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
##   ..$ Spore Strain     : chr [1:276] "LUAm1" "LUAm1" "LUAm1" "LUAm1" ...
##   ..$ Spore Lot        : chr [1:276] "2A" "2A" "2A" "2A" ...
##   ..$ Lot concentration: num [1:276] 176000 176000 176000 176000 176000 176000 176000 176000 176000 176000 ...
##   ..$ Total Spores (M) : num [1:276] 0 10 20 0 10 20 0 10 20 0 ...
##   ..$ Total ul spore   : num [1:276] 0 56.8 113.6 0 56.8 ...
##   ..$ Infection Round  : num [1:276] 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ 40X OP50 (mL)    : num [1:276] 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 ...
##   ..$ Plate Size       : num [1:276] 6 6 6 6 6 6 6 6 6 6 ...
##   ..$ Spores/cm2       : num [1:276] 0 0.354 0.708 0 0.354 ...
##   ..$ Time plated      : chr [1:276] "1300" "1300" "1300" "1300" ...
##   ..$ Time Incubated   : chr [1:276] "1600" "1600" "1600" "1600" ...
##   ..$ Temp             : num [1:276] 21 21 21 21 21 21 21 21 21 21 ...
##   ..$ timepoint        : chr [1:276] "72hpi" "72hpi" "72hpi" "72hpi" ...
##   ..$ infection.type   : chr [1:276] "continuous" "continuous" "continuous" "continuous" ...
##   ..$ Fixing Date      : num [1:276] 190426 190426 190426 190426 190426 ...
##   ..$ Location         : chr [1:276] "Sample exhausted" "Sample exhausted" "Sample exhausted" "Sample exhausted" ...
##   ..$ Staining Date    : num [1:276] 190513 190513 190513 190430 190513 ...
##   ..$ Stain type       : chr [1:276] "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "DY96" ...
##   ..$ Slide date       : num [1:276] 190515 190515 190515 190501 190515 ...
##   ..$ Slide number     : num [1:276] 1 2 3 4 5 6 7 8 9 10 ...
##   ..$ Slide Box        : num [1:276] 2 2 2 2 2 2 2 2 2 2 ...
##   ..$ Imaging Date     : num [1:276] 190516 190516 190516 190502 190516 ...
##  $ : tibble [154 x 301] (S3: tbl_df/tbl/data.frame)
##   ..$ worm.number                            : num [1:154] 1 2 3 4 5 6 7 8 9 10 ...
##   ..$ 200707_N2_LUAm1_0M_72hpi               : chr [1:154] "0_0_18" "0_0_18" "0_0_9" "0_0_15" ...
##   ..$ 200707_N2_LUAm1_10M_72hpi              : chr [1:154] "0_1_7" "0_1_3" "0_1_10" "0_1_8" ...
##   ..$ 200707_JU1400_LUAm1_0M_72hpi           : chr [1:154] "0_0_10" "0_0_9" "0_0_13" "0_0_10" ...
##   ..$ 200707_JU1400_LUAm1_10M_72hpi          : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
##   ..$ 200707_ED3052A_LUAm1_0M_72hpi          : chr [1:154] "0_0_12" "0_0_11" "0_0_14" "0_0_11" ...
##   ..$ 200707_ED3052A_LUAm1_10M_72hpi         : chr [1:154] "0_1_0" "0_1_0" "0_1_1" "0_1_3" ...
##   ..$ 200707_ED3052B_LUAm1_0M_72hpi          : chr [1:154] "0_0_5" "0_0_12" "0_0_11" "0_0_9" ...
##   ..$ 200707_ED3052B_LUAm1_10M_72hpi         : chr [1:154] "0_1_9" "0_1_2" "0_1_4" "0_1_0" ...
##   ..$ 200707_MY1_LUAm1_0M_72hpi              : chr [1:154] "0_0_11" "0_0_9" "0_0_10" "0_0_11" ...
##   ..$ 200707_MY1_LUAm1_10M_72hpi             : chr [1:154] "0_1_0" "0_1_0" "0_1_1" "0_1_0" ...
##   ..$ 200707_N2_MAM1_4M_72hpi                : chr [1:154] "0_1_15" "0_1_18" "0_1_15" "0_1_13" ...
##   ..$ 200707_JU1400_MAM1_4M_72hpi            : chr [1:154] "1_1_2" "1_1_3" "0_0_8" "0_1_4" ...
##   ..$ 200707_N2_LUAm3_10M_72hpi              : chr [1:154] "0_1_27" "0_0_7" "0_1_16" "0_1_12" ...
##   ..$ 200707_JU1400_LUAm3_10M_72hpi          : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
##   ..$ 200707_N2_AWRm78_3.5M_72hpi            : chr [1:154] "1_1_15" "1_1_4" "1_1_23" "1_1_18" ...
##   ..$ 200707_JU1400_AWRm78_3.5M_72hpi        : chr [1:154] "0_1_0" "0_0_0" "0_1_3" "1_1_4" ...
##   ..$ 200707_N2_ERTm5_1.75M_72hpi            : chr [1:154] "1_1_9" "0_1_13" "0_1_0" "1_1_8" ...
##   ..$ 200707_N2_ERTm5_3.5M_72hpi             : chr [1:154] "1_1_1" "1_1_4" "0_1_12" "1_1_13" ...
##   ..$ 200707_JU1400_ERTm5_1.75M_72hpi        : chr [1:154] "0_0_3" "0_0_3" "0_0_5" "0_1_8" ...
##   ..$ 200707_JU1400_ERTm5_3.5M_72hpi         : chr [1:154] "0_0_10" "0_0_0" "0_0_3" "0_0_5" ...
##   ..$ 200707_MY1_ERTm5_1.75M_72hpi           : chr [1:154] "0_1_10" "0_1_13" "0_1_14" "0_1_6" ...
##   ..$ 200707_MY1_ERTm5_3.5M_72hpi            : chr [1:154] "0_1_10" "1_1_0" "0_1_12" "1_1_2" ...
##   ..$ 200714_N2_LUAm1_0M_72hpi               : chr [1:154] "0_0_11" "0_0_19" "0_0_13" "0_0_13" ...
##   ..$ 200714_N2_LUAm1_10M_72hpi              : chr [1:154] "0_1_11" "0_1_9" "0_1_4" "0_1_10" ...
##   ..$ 200714_JU1400_LUAm1_0M_72hpi           : chr [1:154] "0_0_8" "0_0_14" "0_0_4" "0_0_10" ...
##   ..$ 200714_JU1400_LUAm1_10M_72hpi          : chr [1:154] "0_0_0" "0_1_0" "0_1_0" "0_0_0" ...
##   ..$ 200714_ED3052A_LUAm1_0M_72hpi          : chr [1:154] "0_0_8" "0_0_9" "0_0_6" "0_0_8" ...
##   ..$ 200714_ED3052A_LUAm1_10M_72hpi         : chr [1:154] "0_1_0" "0_1_1" "0_1_0" "0_1_1" ...
##   ..$ 200714_ED3052B_LUAm1_0M_72hpi          : chr [1:154] "0_0_18" "0_0_7" "0_0_13" "0_0_20" ...
##   ..$ 200714_ED3052B_LUAm1_10M_72hpi         : chr [1:154] "0_1_5" "0_1_0" "0_1_3" "0_1_5" ...
##   ..$ 200714_MY1_LUAm1_0M_72hpi              : chr [1:154] "0_0_6" "0_0_10" "0_0_23" "0_0_13" ...
##   ..$ 200714_MY1_LUAm1_10M_72hpi             : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
##   ..$ 200714_N2_MAM1_4M_72hpi                : chr [1:154] "0_1_10" "0_1_9" "0_1_18" "0_1_16" ...
##   ..$ 200714_JU1400_MAM1_4M_72hpi            : chr [1:154] "0_1_0" "0_1_4" "0_1_5" "0_0_9" ...
##   ..$ 200714_N2_LUAm3_10M_72hpi              : chr [1:154] "0_1_9" "0_1_12" "0_1_12" "0_1_15" ...
##   ..$ 200714_JU1400_LUAm3_10M_72hpi          : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
##   ..$ 200714_N2_AWRm78_3.5M_72hpi            : chr [1:154] "1_1_9" "1_1_11" "1_1_11" "1_1_9" ...
##   ..$ 200714_JU1400_AWRm78_3.5M_72hpi        : chr [1:154] "0_1_11" "0_1_5" "0_1_4" "0_1_10" ...
##   ..$ 200714_N2_ERTm5_1.75M_72hpi            : chr [1:154] "1_1_16" "1_1_14" "1_1_15" "1_1_8" ...
##   ..$ 200714_N2_ERTm5_3.5M_72hpi             : chr [1:154] "1_1_9" "1_1_6" "1_1_11" "1_1_15" ...
##   ..$ 200714_JU1400_ERTm5_1.75M_72hpi        : chr [1:154] "0_0_10" "0_1_10" "0_1_8" "0_1_10" ...
##   ..$ 200714_JU1400_ERTm5_3.5M_72hpi         : chr [1:154] "0_0_3" "0_0_7" "0_0_4" "0_0_2" ...
##   ..$ 200714_MY1_ERTm5_1.75M_72hpi           : chr [1:154] "0_1_9" "1_1_4" "0_1_14" "0_1_16" ...
##   ..$ 200714_MY1_ERTm5_3.5M_72hpi            : chr [1:154] "0_1_9" "0_1_8" "1_1_15" "0_1_10" ...
##   ..$ 200714_N2_LUAm1_15M_72hpi              : chr [1:154] "NA" "0_1_15" "0_1_10" "0_1_3" ...
##   ..$ 200714_JU1400_LUAm1_15M_72hpi          : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
##   ..$ 200714_ED3052A_LUAm1_15M_72hpi         : chr [1:154] "0_1_0" "0_1_0" "0_1_4" "0_1_0" ...
##   ..$ 200714_ED3052B_LUAm1_15M_72hpi         : chr [1:154] "0_1_6" "0_1_0" "0_1_0" "0_1_0" ...
##   ..$ 200714_MY1_LUAm1_15M_72hpi             : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
##   ..$ 200714_N2_MAM1_8M_72hpi                : chr [1:154] "0_1_9" "0_1_9" "0_1_15" "0_1_0" ...
##   ..$ 200714_JU1400_MAM1_8M_72hpi            : chr [1:154] "0_1_6" "0_1_5" "0_1_4" "0_1_8" ...
##   ..$ 200721_N2_LUAm1_0M_72hpi               : chr [1:154] "0_0_20" "0_0_19" "0_0_16" "0_0_6" ...
##   ..$ 200721_N2_LUAm1_10M_72hpi              : chr [1:154] "0_1_10" "0_1_13" "0_1_11" "0_1_12" ...
##   ..$ 200721_JU1400_LUAm1_0M_72hpi           : chr [1:154] "0_0_7" "0_0_0" "0_0_10" "0_0_10" ...
##   ..$ 200721_JU1400_LUAm1_10M_72hpi          : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
##   ..$ 200721_ED3052A_LUAm1_0M_72hpi          : chr [1:154] "0_0_12" "0_0_17" "0_0_13" "0_0_12" ...
##   ..$ 200721_ED3052A_LUAm1_10M_72hpi         : chr [1:154] "0_1_6" "0_1_9" "0_1_5" "0_1_10" ...
##   ..$ 200721_ED3052B_LUAm1_0M_72hpi          : chr [1:154] "0_0_10" "0_0_12" "0_0_9" "0_0_8" ...
##   ..$ 200721_ED3052B_LUAm1_10M_72hpi         : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
##   ..$ 200721_MY1_LUAm1_0M_72hpi              : chr [1:154] "0_0_10" "0_0_11" "0_0_8" "0_0_0" ...
##   ..$ 200721_MY1_LUAm1_10M_72hpi             : chr [1:154] "0_1_0" "0_1_3" "0_1_0" "0_1_0" ...
##   ..$ 200721_N2_MAM1_4M_72hpi                : chr [1:154] "0_1_21" "0_1_17" "0_1_17" "0_1_10" ...
##   ..$ 200721_JU1400_MAM1_4M_72hpi            : chr [1:154] "0_1_4" "0_0_7" "0_1_5" "1_1_7" ...
##   ..$ 200721_N2_LUAm3_10M_72hpi              : chr [1:154] "0_1_11" "0_1_12" "0_1_8" "0_1_15" ...
##   ..$ 200721_JU1400_LUAm3_10M_72hpi          : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
##   ..$ 200721_N2_AWRm78_3.5M_72hpi            : chr [1:154] "1_1_4" "1_1_9" "1_1_12" "0_1_14" ...
##   ..$ 200721_JU1400_AWRm78_3.5M_72hpi        : chr [1:154] "0_1_0" "0_1_0" "0_1_4" "1_1_7" ...
##   ..$ 200721_N2_ERTm5_1.75M_72hpi            : chr [1:154] "1_1_8" "1_1_12" "1_1_12" "0_1_16" ...
##   ..$ 200721_N2_ERTm5_3.5M_72hpi             : chr [1:154] "1_1_6" "1_1_4" "1_1_1" "1_1_13" ...
##   ..$ 200721_JU1400_ERTm5_1.75M_72hpi        : chr [1:154] "0_0_5" "0_0_11" "1_1_7" "0_0_0" ...
##   ..$ 200721_JU1400_ERTm5_3.5M_72hpi         : chr [1:154] "0_1_3" "0_0_1" "0_1_2" "0_0_0" ...
##   ..$ 200721_MY1_ERTm5_1.75M_72hpi           : chr [1:154] "1_1_15" "0_1_10" "1_1_20" "0_1_13" ...
##   ..$ 200721_MY1_ERTm5_3.5M_72hpi            : chr [1:154] "1_1_9" "0_1_6" "0_1_12" "0_1_12" ...
##   ..$ 200721_N2_MAM1_8M_72hpi                : chr [1:154] "1_1_16" "0_1_10" "0_1_18" "0_1_19" ...
##   ..$ 200721_JU1400_MAM1_8M_72hpi            : chr [1:154] "0_1_3" "0_1_0" "1_1_3" "1_1_0" ...
##   ..$ 200821_N2_LUAm1_0M_72hpi               : chr [1:154] "0_0_12" "0_0_14" "0_0_24" "0_0_14" ...
##   ..$ 200821_N2_LUAm1_10M_72hpi              : chr [1:154] "0_1_9" "0_1_13" "0_1_10" "0_1_14" ...
##   ..$ 200821_JU1400_LUAm1_0M_72hpi           : chr [1:154] "0_0_15" "0_0_10" "0_0_11" "0_0_17" ...
##   ..$ 200821_JU1400_LUAm1_10M_72hpi          : chr [1:154] "0_1_0" "0_1_4" "0_1_0" "0_1_0" ...
##   ..$ 200821_VC40171_LUAm1_0M_72hpi          : chr [1:154] "0_0_10" "0_0_11" "0_0_13" "0_0_9" ...
##   ..$ 200821_VC40171_LUAm1_10M_72hpi         : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
##   ..$ 200821_AWR144_LUAm1_0M_72hpi           : chr [1:154] "0_0_23" "0_0_20" "0_0_22" "0_0_3" ...
##   ..$ 200821_AWR144_LUAm1_10M_72hpi          : chr [1:154] "0_1_9" "0_1_11" "0_1_9" "0_1_13" ...
##   ..$ 200821_AWR145_LUAm1_0M_72hpi           : chr [1:154] "0_0_30" "0_0_23" "0_0_24" "0_0_21" ...
##   ..$ 200821_AWR145_LUAm1_10M_72hpi          : chr [1:154] "0_1_2" "0_1_0" "0_1_12" "0_1_11" ...
##   ..$ 200821_N2_LUAm1-HK_10M_72hpi           : chr [1:154] "0_0_12" "0_0_23" "0_0_0" "0_0_15" ...
##   ..$ 200821_JU1400_LUAm1-HK_10M_72hpi       : chr [1:154] "0_0_14" "0_0_8" "0_0_15" "0_0_17" ...
##   ..$ 200821_N2_LUAm1-sup_10M_72hpi          : chr [1:154] "0_0_24" "0_0_24" "0_0_26" "0_0_21" ...
##   ..$ 200821_JU1400_LUAm1-sup_10M_72hpi      : chr [1:154] "0_0_13" "0_0_9" "0_0_11" "0_0_11" ...
##   ..$ 200821_N2_LUAm1-pel_10M_72hpi          : chr [1:154] "0_1_9" "0_1_11" "0_1_14" "0_1_15" ...
##   ..$ 200821_JU1400_LUAm1-pel_10M_72hpi      : chr [1:154] "0_1_0" "0_1_0" "0_1_4" "0_1_1" ...
##   ..$ 200825_N2_LUAm1_0M_72hpi               : chr [1:154] "0_0_26" "0_0_24" "0_0_23" "0_0_17" ...
##   ..$ 200825_N2_LUAm1_10M_72hpi              : chr [1:154] "0_1_13" "0_1_12" "0_1_15" "0_1_17" ...
##   ..$ 200825_JU1400_LUAm1_0M_72hpi           : chr [1:154] "0_0_10" "0_0_12" "0_0_14" "0_0_14" ...
##   ..$ 200825_JU1400_LUAm1_10M_72hpi          : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
##   ..$ 200825_VC40171_LUAm1_0M_72hpi          : chr [1:154] "0_0_8" "0_0_11" "0_0_11" "0_0_5" ...
##   ..$ 200825_VC40171_LUAm1_10M_72hpi         : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
##   ..$ 200825_AWR144_LUAm1_0M_72hpi           : chr [1:154] "0_0_14" "0_0_27" "0_0_14" "0_0_15" ...
##   .. [list output truncated]
##  $ : tibble [7 x 5] (S3: tbl_df/tbl/data.frame)
##   ..$ spore.strain             : chr [1:7] "ERTm1" "ERTm2" "ERTm5" "LUAm1" ...
##   ..$ spore.species            : chr [1:7] "N. parisii" "N. ausubeli" "N. ironsii" "N. ferruginous" ...
##   ..$ infection.location       : chr [1:7] "intestine" "intestine" "intestine" "epidermis" ...
##   ..$ original.nematode.species: chr [1:7] "C. elegans" "C. briggsae" "C. briggsae" "C. elegans" ...
##   ..$ original.location        : chr [1:7] "France" "India" "Hawaii, USA" "France" ...

It’s a lot of output but if we look carefully we can see an unnamed list of 3 elements with each being a tibble object.

2.3.1 The finer details of lapply()

Remember the parameters of

`read_excel(path, sheet = NULL, range = NULL)`

Notice that the second position parameter is sheet. In our lapply() function assignment we didn’t specifically name that parameter! Recall we used:

lapply(X= excel_sheets("data/miscellaneous.xlsx"), FUN = read_excel, path = "data/miscellaneous.xlsx")

and thus explicitly named our first parameter path. The next available parameter by default order was sheet to which the elements of X were automatically applied. We now have a list object with each worksheet being one item in the list.

If we wanted to explicitly name our sheets in our function definition we would need to explicitly define our function in the FUN parameter. While we won’t learn about defining functions until lecture 07, you should be familiar with this idea from lecture 01 (section 4.3.2). In this case, you could use the following code:

# You can define your function directly with FUN = function(x)
str(lapply(X = excel_sheets("data/infection_data_all.xlsx"), # this will set X to a character vector
                            FUN = function(x) read_excel(path = "data/infection_data_all.xlsx", 
                                                         sheet = x) 
          ) # End of lapply
   ) # end of str
## List of 3
##  $ : tibble [276 x 29] (S3: tbl_df/tbl/data.frame)
##   ..$ experiment       : chr [1:276] "190426_VC20019_LUAm1_0M_72hpi" "190426_VC20019_LUAm1_10M_72hpi" "190426_VC20019_LUAm1_20M_72hpi" "190426_N2_LUAm1_0M_72hpi" ...
##   ..$ experimenter     : chr [1:276] "CM" "CM" "CM" "CM" ...
##   ..$ description      : chr [1:276] "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" ...
##   ..$ Infection Date   : num [1:276] 190423 190423 190423 190423 190423 ...
##   ..$ Plate Number     : num [1:276] 1 2 3 4 5 6 7 8 9 10 ...
##   ..$ Worm_strain      : chr [1:276] "VC20019" "VC20019" "VC20019" "N2" ...
##   ..$ Total Worms      : num [1:276] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
##   ..$ Spore Strain     : chr [1:276] "LUAm1" "LUAm1" "LUAm1" "LUAm1" ...
##   ..$ Spore Lot        : chr [1:276] "2A" "2A" "2A" "2A" ...
##   ..$ Lot concentration: num [1:276] 176000 176000 176000 176000 176000 176000 176000 176000 176000 176000 ...
##   ..$ Total Spores (M) : num [1:276] 0 10 20 0 10 20 0 10 20 0 ...
##   ..$ Total ul spore   : num [1:276] 0 56.8 113.6 0 56.8 ...
##   ..$ Infection Round  : num [1:276] 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ 40X OP50 (mL)    : num [1:276] 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 ...
##   ..$ Plate Size       : num [1:276] 6 6 6 6 6 6 6 6 6 6 ...
##   ..$ Spores/cm2       : num [1:276] 0 0.354 0.708 0 0.354 ...
##   ..$ Time plated      : chr [1:276] "1300" "1300" "1300" "1300" ...
##   ..$ Time Incubated   : chr [1:276] "1600" "1600" "1600" "1600" ...
##   ..$ Temp             : num [1:276] 21 21 21 21 21 21 21 21 21 21 ...
##   ..$ timepoint        : chr [1:276] "72hpi" "72hpi" "72hpi" "72hpi" ...
##   ..$ infection.type   : chr [1:276] "continuous" "continuous" "continuous" "continuous" ...
##   ..$ Fixing Date      : num [1:276] 190426 190426 190426 190426 190426 ...
##   ..$ Location         : chr [1:276] "Sample exhausted" "Sample exhausted" "Sample exhausted" "Sample exhausted" ...
##   ..$ Staining Date    : num [1:276] 190513 190513 190513 190430 190513 ...
##   ..$ Stain type       : chr [1:276] "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "DY96" ...
##   ..$ Slide date       : num [1:276] 190515 190515 190515 190501 190515 ...
##   ..$ Slide number     : num [1:276] 1 2 3 4 5 6 7 8 9 10 ...
##   ..$ Slide Box        : num [1:276] 2 2 2 2 2 2 2 2 2 2 ...
##   ..$ Imaging Date     : num [1:276] 190516 190516 190516 190502 190516 ...
##  $ : tibble [154 x 301] (S3: tbl_df/tbl/data.frame)
##   ..$ worm.number                            : num [1:154] 1 2 3 4 5 6 7 8 9 10 ...
##   ..$ 200707_N2_LUAm1_0M_72hpi               : chr [1:154] "0_0_18" "0_0_18" "0_0_9" "0_0_15" ...
##   ..$ 200707_N2_LUAm1_10M_72hpi              : chr [1:154] "0_1_7" "0_1_3" "0_1_10" "0_1_8" ...
##   ..$ 200707_JU1400_LUAm1_0M_72hpi           : chr [1:154] "0_0_10" "0_0_9" "0_0_13" "0_0_10" ...
##   ..$ 200707_JU1400_LUAm1_10M_72hpi          : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
##   ..$ 200707_ED3052A_LUAm1_0M_72hpi          : chr [1:154] "0_0_12" "0_0_11" "0_0_14" "0_0_11" ...
##   ..$ 200707_ED3052A_LUAm1_10M_72hpi         : chr [1:154] "0_1_0" "0_1_0" "0_1_1" "0_1_3" ...
##   ..$ 200707_ED3052B_LUAm1_0M_72hpi          : chr [1:154] "0_0_5" "0_0_12" "0_0_11" "0_0_9" ...
##   ..$ 200707_ED3052B_LUAm1_10M_72hpi         : chr [1:154] "0_1_9" "0_1_2" "0_1_4" "0_1_0" ...
##   ..$ 200707_MY1_LUAm1_0M_72hpi              : chr [1:154] "0_0_11" "0_0_9" "0_0_10" "0_0_11" ...
##   ..$ 200707_MY1_LUAm1_10M_72hpi             : chr [1:154] "0_1_0" "0_1_0" "0_1_1" "0_1_0" ...
##   ..$ 200707_N2_MAM1_4M_72hpi                : chr [1:154] "0_1_15" "0_1_18" "0_1_15" "0_1_13" ...
##   ..$ 200707_JU1400_MAM1_4M_72hpi            : chr [1:154] "1_1_2" "1_1_3" "0_0_8" "0_1_4" ...
##   ..$ 200707_N2_LUAm3_10M_72hpi              : chr [1:154] "0_1_27" "0_0_7" "0_1_16" "0_1_12" ...
##   ..$ 200707_JU1400_LUAm3_10M_72hpi          : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
##   ..$ 200707_N2_AWRm78_3.5M_72hpi            : chr [1:154] "1_1_15" "1_1_4" "1_1_23" "1_1_18" ...
##   ..$ 200707_JU1400_AWRm78_3.5M_72hpi        : chr [1:154] "0_1_0" "0_0_0" "0_1_3" "1_1_4" ...
##   ..$ 200707_N2_ERTm5_1.75M_72hpi            : chr [1:154] "1_1_9" "0_1_13" "0_1_0" "1_1_8" ...
##   ..$ 200707_N2_ERTm5_3.5M_72hpi             : chr [1:154] "1_1_1" "1_1_4" "0_1_12" "1_1_13" ...
##   ..$ 200707_JU1400_ERTm5_1.75M_72hpi        : chr [1:154] "0_0_3" "0_0_3" "0_0_5" "0_1_8" ...
##   ..$ 200707_JU1400_ERTm5_3.5M_72hpi         : chr [1:154] "0_0_10" "0_0_0" "0_0_3" "0_0_5" ...
##   ..$ 200707_MY1_ERTm5_1.75M_72hpi           : chr [1:154] "0_1_10" "0_1_13" "0_1_14" "0_1_6" ...
##   ..$ 200707_MY1_ERTm5_3.5M_72hpi            : chr [1:154] "0_1_10" "1_1_0" "0_1_12" "1_1_2" ...
##   ..$ 200714_N2_LUAm1_0M_72hpi               : chr [1:154] "0_0_11" "0_0_19" "0_0_13" "0_0_13" ...
##   ..$ 200714_N2_LUAm1_10M_72hpi              : chr [1:154] "0_1_11" "0_1_9" "0_1_4" "0_1_10" ...
##   ..$ 200714_JU1400_LUAm1_0M_72hpi           : chr [1:154] "0_0_8" "0_0_14" "0_0_4" "0_0_10" ...
##   ..$ 200714_JU1400_LUAm1_10M_72hpi          : chr [1:154] "0_0_0" "0_1_0" "0_1_0" "0_0_0" ...
##   ..$ 200714_ED3052A_LUAm1_0M_72hpi          : chr [1:154] "0_0_8" "0_0_9" "0_0_6" "0_0_8" ...
##   ..$ 200714_ED3052A_LUAm1_10M_72hpi         : chr [1:154] "0_1_0" "0_1_1" "0_1_0" "0_1_1" ...
##   ..$ 200714_ED3052B_LUAm1_0M_72hpi          : chr [1:154] "0_0_18" "0_0_7" "0_0_13" "0_0_20" ...
##   ..$ 200714_ED3052B_LUAm1_10M_72hpi         : chr [1:154] "0_1_5" "0_1_0" "0_1_3" "0_1_5" ...
##   ..$ 200714_MY1_LUAm1_0M_72hpi              : chr [1:154] "0_0_6" "0_0_10" "0_0_23" "0_0_13" ...
##   ..$ 200714_MY1_LUAm1_10M_72hpi             : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
##   ..$ 200714_N2_MAM1_4M_72hpi                : chr [1:154] "0_1_10" "0_1_9" "0_1_18" "0_1_16" ...
##   ..$ 200714_JU1400_MAM1_4M_72hpi            : chr [1:154] "0_1_0" "0_1_4" "0_1_5" "0_0_9" ...
##   ..$ 200714_N2_LUAm3_10M_72hpi              : chr [1:154] "0_1_9" "0_1_12" "0_1_12" "0_1_15" ...
##   ..$ 200714_JU1400_LUAm3_10M_72hpi          : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
##   ..$ 200714_N2_AWRm78_3.5M_72hpi            : chr [1:154] "1_1_9" "1_1_11" "1_1_11" "1_1_9" ...
##   ..$ 200714_JU1400_AWRm78_3.5M_72hpi        : chr [1:154] "0_1_11" "0_1_5" "0_1_4" "0_1_10" ...
##   ..$ 200714_N2_ERTm5_1.75M_72hpi            : chr [1:154] "1_1_16" "1_1_14" "1_1_15" "1_1_8" ...
##   ..$ 200714_N2_ERTm5_3.5M_72hpi             : chr [1:154] "1_1_9" "1_1_6" "1_1_11" "1_1_15" ...
##   ..$ 200714_JU1400_ERTm5_1.75M_72hpi        : chr [1:154] "0_0_10" "0_1_10" "0_1_8" "0_1_10" ...
##   ..$ 200714_JU1400_ERTm5_3.5M_72hpi         : chr [1:154] "0_0_3" "0_0_7" "0_0_4" "0_0_2" ...
##   ..$ 200714_MY1_ERTm5_1.75M_72hpi           : chr [1:154] "0_1_9" "1_1_4" "0_1_14" "0_1_16" ...
##   ..$ 200714_MY1_ERTm5_3.5M_72hpi            : chr [1:154] "0_1_9" "0_1_8" "1_1_15" "0_1_10" ...
##   ..$ 200714_N2_LUAm1_15M_72hpi              : chr [1:154] "NA" "0_1_15" "0_1_10" "0_1_3" ...
##   ..$ 200714_JU1400_LUAm1_15M_72hpi          : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
##   ..$ 200714_ED3052A_LUAm1_15M_72hpi         : chr [1:154] "0_1_0" "0_1_0" "0_1_4" "0_1_0" ...
##   ..$ 200714_ED3052B_LUAm1_15M_72hpi         : chr [1:154] "0_1_6" "0_1_0" "0_1_0" "0_1_0" ...
##   ..$ 200714_MY1_LUAm1_15M_72hpi             : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
##   ..$ 200714_N2_MAM1_8M_72hpi                : chr [1:154] "0_1_9" "0_1_9" "0_1_15" "0_1_0" ...
##   ..$ 200714_JU1400_MAM1_8M_72hpi            : chr [1:154] "0_1_6" "0_1_5" "0_1_4" "0_1_8" ...
##   ..$ 200721_N2_LUAm1_0M_72hpi               : chr [1:154] "0_0_20" "0_0_19" "0_0_16" "0_0_6" ...
##   ..$ 200721_N2_LUAm1_10M_72hpi              : chr [1:154] "0_1_10" "0_1_13" "0_1_11" "0_1_12" ...
##   ..$ 200721_JU1400_LUAm1_0M_72hpi           : chr [1:154] "0_0_7" "0_0_0" "0_0_10" "0_0_10" ...
##   ..$ 200721_JU1400_LUAm1_10M_72hpi          : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
##   ..$ 200721_ED3052A_LUAm1_0M_72hpi          : chr [1:154] "0_0_12" "0_0_17" "0_0_13" "0_0_12" ...
##   ..$ 200721_ED3052A_LUAm1_10M_72hpi         : chr [1:154] "0_1_6" "0_1_9" "0_1_5" "0_1_10" ...
##   ..$ 200721_ED3052B_LUAm1_0M_72hpi          : chr [1:154] "0_0_10" "0_0_12" "0_0_9" "0_0_8" ...
##   ..$ 200721_ED3052B_LUAm1_10M_72hpi         : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
##   ..$ 200721_MY1_LUAm1_0M_72hpi              : chr [1:154] "0_0_10" "0_0_11" "0_0_8" "0_0_0" ...
##   ..$ 200721_MY1_LUAm1_10M_72hpi             : chr [1:154] "0_1_0" "0_1_3" "0_1_0" "0_1_0" ...
##   ..$ 200721_N2_MAM1_4M_72hpi                : chr [1:154] "0_1_21" "0_1_17" "0_1_17" "0_1_10" ...
##   ..$ 200721_JU1400_MAM1_4M_72hpi            : chr [1:154] "0_1_4" "0_0_7" "0_1_5" "1_1_7" ...
##   ..$ 200721_N2_LUAm3_10M_72hpi              : chr [1:154] "0_1_11" "0_1_12" "0_1_8" "0_1_15" ...
##   ..$ 200721_JU1400_LUAm3_10M_72hpi          : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
##   ..$ 200721_N2_AWRm78_3.5M_72hpi            : chr [1:154] "1_1_4" "1_1_9" "1_1_12" "0_1_14" ...
##   ..$ 200721_JU1400_AWRm78_3.5M_72hpi        : chr [1:154] "0_1_0" "0_1_0" "0_1_4" "1_1_7" ...
##   ..$ 200721_N2_ERTm5_1.75M_72hpi            : chr [1:154] "1_1_8" "1_1_12" "1_1_12" "0_1_16" ...
##   ..$ 200721_N2_ERTm5_3.5M_72hpi             : chr [1:154] "1_1_6" "1_1_4" "1_1_1" "1_1_13" ...
##   ..$ 200721_JU1400_ERTm5_1.75M_72hpi        : chr [1:154] "0_0_5" "0_0_11" "1_1_7" "0_0_0" ...
##   ..$ 200721_JU1400_ERTm5_3.5M_72hpi         : chr [1:154] "0_1_3" "0_0_1" "0_1_2" "0_0_0" ...
##   ..$ 200721_MY1_ERTm5_1.75M_72hpi           : chr [1:154] "1_1_15" "0_1_10" "1_1_20" "0_1_13" ...
##   ..$ 200721_MY1_ERTm5_3.5M_72hpi            : chr [1:154] "1_1_9" "0_1_6" "0_1_12" "0_1_12" ...
##   ..$ 200721_N2_MAM1_8M_72hpi                : chr [1:154] "1_1_16" "0_1_10" "0_1_18" "0_1_19" ...
##   ..$ 200721_JU1400_MAM1_8M_72hpi            : chr [1:154] "0_1_3" "0_1_0" "1_1_3" "1_1_0" ...
##   ..$ 200821_N2_LUAm1_0M_72hpi               : chr [1:154] "0_0_12" "0_0_14" "0_0_24" "0_0_14" ...
##   ..$ 200821_N2_LUAm1_10M_72hpi              : chr [1:154] "0_1_9" "0_1_13" "0_1_10" "0_1_14" ...
##   ..$ 200821_JU1400_LUAm1_0M_72hpi           : chr [1:154] "0_0_15" "0_0_10" "0_0_11" "0_0_17" ...
##   ..$ 200821_JU1400_LUAm1_10M_72hpi          : chr [1:154] "0_1_0" "0_1_4" "0_1_0" "0_1_0" ...
##   ..$ 200821_VC40171_LUAm1_0M_72hpi          : chr [1:154] "0_0_10" "0_0_11" "0_0_13" "0_0_9" ...
##   ..$ 200821_VC40171_LUAm1_10M_72hpi         : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
##   ..$ 200821_AWR144_LUAm1_0M_72hpi           : chr [1:154] "0_0_23" "0_0_20" "0_0_22" "0_0_3" ...
##   ..$ 200821_AWR144_LUAm1_10M_72hpi          : chr [1:154] "0_1_9" "0_1_11" "0_1_9" "0_1_13" ...
##   ..$ 200821_AWR145_LUAm1_0M_72hpi           : chr [1:154] "0_0_30" "0_0_23" "0_0_24" "0_0_21" ...
##   ..$ 200821_AWR145_LUAm1_10M_72hpi          : chr [1:154] "0_1_2" "0_1_0" "0_1_12" "0_1_11" ...
##   ..$ 200821_N2_LUAm1-HK_10M_72hpi           : chr [1:154] "0_0_12" "0_0_23" "0_0_0" "0_0_15" ...
##   ..$ 200821_JU1400_LUAm1-HK_10M_72hpi       : chr [1:154] "0_0_14" "0_0_8" "0_0_15" "0_0_17" ...
##   ..$ 200821_N2_LUAm1-sup_10M_72hpi          : chr [1:154] "0_0_24" "0_0_24" "0_0_26" "0_0_21" ...
##   ..$ 200821_JU1400_LUAm1-sup_10M_72hpi      : chr [1:154] "0_0_13" "0_0_9" "0_0_11" "0_0_11" ...
##   ..$ 200821_N2_LUAm1-pel_10M_72hpi          : chr [1:154] "0_1_9" "0_1_11" "0_1_14" "0_1_15" ...
##   ..$ 200821_JU1400_LUAm1-pel_10M_72hpi      : chr [1:154] "0_1_0" "0_1_0" "0_1_4" "0_1_1" ...
##   ..$ 200825_N2_LUAm1_0M_72hpi               : chr [1:154] "0_0_26" "0_0_24" "0_0_23" "0_0_17" ...
##   ..$ 200825_N2_LUAm1_10M_72hpi              : chr [1:154] "0_1_13" "0_1_12" "0_1_15" "0_1_17" ...
##   ..$ 200825_JU1400_LUAm1_0M_72hpi           : chr [1:154] "0_0_10" "0_0_12" "0_0_14" "0_0_14" ...
##   ..$ 200825_JU1400_LUAm1_10M_72hpi          : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
##   ..$ 200825_VC40171_LUAm1_0M_72hpi          : chr [1:154] "0_0_8" "0_0_11" "0_0_11" "0_0_5" ...
##   ..$ 200825_VC40171_LUAm1_10M_72hpi         : chr [1:154] "0_1_0" "0_1_0" "0_1_0" "0_1_0" ...
##   ..$ 200825_AWR144_LUAm1_0M_72hpi           : chr [1:154] "0_0_14" "0_0_27" "0_0_14" "0_0_15" ...
##   .. [list output truncated]
##  $ : tibble [7 x 5] (S3: tbl_df/tbl/data.frame)
##   ..$ spore.strain             : chr [1:7] "ERTm1" "ERTm2" "ERTm5" "LUAm1" ...
##   ..$ spore.species            : chr [1:7] "N. parisii" "N. ausubeli" "N. ironsii" "N. ferruginous" ...
##   ..$ infection.location       : chr [1:7] "intestine" "intestine" "intestine" "epidermis" ...
##   ..$ original.nematode.species: chr [1:7] "C. elegans" "C. briggsae" "C. briggsae" "C. elegans" ...
##   ..$ original.location        : chr [1:7] "France" "India" "Hawaii, USA" "France" ...

Remember, that with the list that we generate, you can index the tibble you would like to work with using the syntax list[[x]] and store it as a variable using leftward assignment.

Working with lists of data.frames (or tibbles): can be cumbersome but applying multiple procedures to these objects can be made easier with the purr package which extends the abilities of R to associate and run functions on elements from a list.

# You can see the structure of our first list element. 
# Remember the difference between [[]] and []?
str(excel_sheets_list[[1]])
## tibble [276 x 29] (S3: tbl_df/tbl/data.frame)
##  $ experiment       : chr [1:276] "190426_VC20019_LUAm1_0M_72hpi" "190426_VC20019_LUAm1_10M_72hpi" "190426_VC20019_LUAm1_20M_72hpi" "190426_N2_LUAm1_0M_72hpi" ...
##  $ experimenter     : chr [1:276] "CM" "CM" "CM" "CM" ...
##  $ description      : chr [1:276] "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" ...
##  $ Infection Date   : num [1:276] 190423 190423 190423 190423 190423 ...
##  $ Plate Number     : num [1:276] 1 2 3 4 5 6 7 8 9 10 ...
##  $ Worm_strain      : chr [1:276] "VC20019" "VC20019" "VC20019" "N2" ...
##  $ Total Worms      : num [1:276] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
##  $ Spore Strain     : chr [1:276] "LUAm1" "LUAm1" "LUAm1" "LUAm1" ...
##  $ Spore Lot        : chr [1:276] "2A" "2A" "2A" "2A" ...
##  $ Lot concentration: num [1:276] 176000 176000 176000 176000 176000 176000 176000 176000 176000 176000 ...
##  $ Total Spores (M) : num [1:276] 0 10 20 0 10 20 0 10 20 0 ...
##  $ Total ul spore   : num [1:276] 0 56.8 113.6 0 56.8 ...
##  $ Infection Round  : num [1:276] 1 1 1 1 1 1 1 1 1 1 ...
##  $ 40X OP50 (mL)    : num [1:276] 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 ...
##  $ Plate Size       : num [1:276] 6 6 6 6 6 6 6 6 6 6 ...
##  $ Spores/cm2       : num [1:276] 0 0.354 0.708 0 0.354 ...
##  $ Time plated      : chr [1:276] "1300" "1300" "1300" "1300" ...
##  $ Time Incubated   : chr [1:276] "1600" "1600" "1600" "1600" ...
##  $ Temp             : num [1:276] 21 21 21 21 21 21 21 21 21 21 ...
##  $ timepoint        : chr [1:276] "72hpi" "72hpi" "72hpi" "72hpi" ...
##  $ infection.type   : chr [1:276] "continuous" "continuous" "continuous" "continuous" ...
##  $ Fixing Date      : num [1:276] 190426 190426 190426 190426 190426 ...
##  $ Location         : chr [1:276] "Sample exhausted" "Sample exhausted" "Sample exhausted" "Sample exhausted" ...
##  $ Staining Date    : num [1:276] 190513 190513 190513 190430 190513 ...
##  $ Stain type       : chr [1:276] "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "DY96" ...
##  $ Slide date       : num [1:276] 190515 190515 190515 190501 190515 ...
##  $ Slide number     : num [1:276] 1 2 3 4 5 6 7 8 9 10 ...
##  $ Slide Box        : num [1:276] 2 2 2 2 2 2 2 2 2 2 ...
##  $ Imaging Date     : num [1:276] 190516 190516 190516 190502 190516 ...

2.3.2 A tibble is essentially a data.frame

Notice that the object type of our imported sheet isn’t exactly a data.frame. Rather it is a tibble which is an extended version of the data.frame. Overall a tibble replicates the same behaviours as a data.frame except when printing/displaying (only outputs the first 10 rows vs. all) and in how we subset a single column. As long as you use methods from within the tidyverse, this construct will work just fine.

Subsetting a tibble using the index notation [, 1] returns a tibble object containing the first column of your data. In a data.frame, this same notation would return a vector object. This can sometimes cause type-errors when working with older functions or packages outside the tidyverse. If you want to retrieve a column vector from a tibble object, you can use the $ indexing notation or the dplyr::pull() function.

If you’d like to exclusively work with a data.frame, you can cast it using the as.data.frame() command.

# Pull a single column from our tibble
print("Indexing a column from a tibble is still a tibble")
## [1] "Indexing a column from a tibble is still a tibble"
str(excel_sheets_list[[1]][,1])
## tibble [276 x 1] (S3: tbl_df/tbl/data.frame)
##  $ experiment: chr [1:276] "190426_VC20019_LUAm1_0M_72hpi" "190426_VC20019_LUAm1_10M_72hpi" "190426_VC20019_LUAm1_20M_72hpi" "190426_N2_LUAm1_0M_72hpi" ...
# Index a column with the $ notation but you need to know the name of your column
cat("\n") # print a blank line
print("Indexing a column into a vector with $")
## [1] "Indexing a column into a vector with $"
str(excel_sheets_list[[1]]$experiment)
##  chr [1:276] "190426_VC20019_LUAm1_0M_72hpi" ...
# Index a column with the pull() function if you know it's position or name
cat("\n")
print("Indexing a column into a vector with pull()")
## [1] "Indexing a column into a vector with pull()"
str(pull(excel_sheets_list[[1]], 1))
##  chr [1:276] "190426_VC20019_LUAm1_0M_72hpi" ...
# Cast the tibble to a data.frame and then pull a single column
cat("\n")
print("Indexing a column from a data.frame becomes a vector")
## [1] "Indexing a column from a data.frame becomes a vector"
str(data.frame(excel_sheets_list[[1]])[,1])
##  chr [1:276] "190426_VC20019_LUAm1_0M_72hpi" ...

2.3.3 Re-assign elements from a list to a new variable

At this point, we would like to just use our imported excel worksheet as a normal data.frame in R. We’ll assign it to a new variable metadata_sheet.df using the correct indexing notation.

If you are a googlesheets person, you can use the package we installed (surprisingly called ‘googlesheets4’) in section 1.4.0 that will allow you to get your worksheets in and out of R. For more information on googlesheets, checkout more at the tidyverse/googlesheets4 page

# Let's assign our first sheet to it's own variable
metadata_sheet.df <- as.data.frame(excel_sheets_list[[1]])

str(metadata_sheet.df)
## 'data.frame':    276 obs. of  29 variables:
##  $ experiment       : chr  "190426_VC20019_LUAm1_0M_72hpi" "190426_VC20019_LUAm1_10M_72hpi" "190426_VC20019_LUAm1_20M_72hpi" "190426_N2_LUAm1_0M_72hpi" ...
##  $ experimenter     : chr  "CM" "CM" "CM" "CM" ...
##  $ description      : chr  "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" ...
##  $ Infection Date   : num  190423 190423 190423 190423 190423 ...
##  $ Plate Number     : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ Worm_strain      : chr  "VC20019" "VC20019" "VC20019" "N2" ...
##  $ Total Worms      : num  1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
##  $ Spore Strain     : chr  "LUAm1" "LUAm1" "LUAm1" "LUAm1" ...
##  $ Spore Lot        : chr  "2A" "2A" "2A" "2A" ...
##  $ Lot concentration: num  176000 176000 176000 176000 176000 176000 176000 176000 176000 176000 ...
##  $ Total Spores (M) : num  0 10 20 0 10 20 0 10 20 0 ...
##  $ Total ul spore   : num  0 56.8 113.6 0 56.8 ...
##  $ Infection Round  : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ 40X OP50 (mL)    : num  0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 ...
##  $ Plate Size       : num  6 6 6 6 6 6 6 6 6 6 ...
##  $ Spores/cm2       : num  0 0.354 0.708 0 0.354 ...
##  $ Time plated      : chr  "1300" "1300" "1300" "1300" ...
##  $ Time Incubated   : chr  "1600" "1600" "1600" "1600" ...
##  $ Temp             : num  21 21 21 21 21 21 21 21 21 21 ...
##  $ timepoint        : chr  "72hpi" "72hpi" "72hpi" "72hpi" ...
##  $ infection.type   : chr  "continuous" "continuous" "continuous" "continuous" ...
##  $ Fixing Date      : num  190426 190426 190426 190426 190426 ...
##  $ Location         : chr  "Sample exhausted" "Sample exhausted" "Sample exhausted" "Sample exhausted" ...
##  $ Staining Date    : num  190513 190513 190513 190430 190513 ...
##  $ Stain type       : chr  "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "DY96" ...
##  $ Slide date       : num  190515 190515 190515 190501 190515 ...
##  $ Slide number     : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ Slide Box        : num  2 2 2 2 2 2 2 2 2 2 ...
##  $ Imaging Date     : num  190516 190516 190516 190502 190516 ...

What’s the difference between data.frame() and as.data.frame()? Without getting bogged down in the details there is a distinction when using the data.frame() and as.data.frame() functions. The former can be used to create a data.frame from scratch. As we saw in Lecture 01, you can provide one or more vectors of the same length to produce a data.frame object. On the other hand, if you want to convert a data.frame-like object (ie a matrix, tibble or array) to a data.frame, you could use data.frame() BUT it is slightly slower than as.data.frame() which is specifically designed to accept a single argument to be converted into a data.frame.

Comprehension question 2.0.0: Compare the structure information for the tibble version of our imported data in section 2.3.1 versus the above data frame version of the data. What differences do you notice about the columns? Name some other differences between a tibble and a data frame.


Section 2.0.0 comprehension answer:


3.0.0 Inspecting your data

Image courtesy of xkcd at https://xkcd.com/2054/

We’ll often make assumptions about our datasets, like all of the values for a variable are within a certain range, or all positive. We also usually assume that all of the entries in our data are complete - no missing values or incorrect categories. This can be a bit of a trap - especially in large datasets where we cannot view it all by eye. Here we’ll discuss some helpful tools for inspecting your data before you start using more complex code for it.


3.1.0 Helpful commands for inspecting your data

When first importing data (especially from outside sources) it is best to inspect it for problems like missing values, inconsistent formatting, special characters, etc. Here, we’ll inspect our dataset, store it in a variable, and check out the structure by reviewing some helpful commands:

  1. class() to quickly determine the object type. You see this information in the str() command too.
  2. head() to quickly view just the first n rows of your data.
  3. tail() to quickly view just the last n rows of your data.
  4. unique() to quickly view the unique values in a vector or similar data structure.
  5. glimpse() and View() (in RStudio) to take a peek at your data structures.

3.1.1 Use head() to view the first portion of your data

You can take a look at the first few rows (6 by default) of your data.frame using the head() function. In fact you can play with the parameters to pull a specific number of rows or lines from the start of your data.frame or other object.

# Re-import our infection_meta.csv file from the data folder if you need to
# infection_meta.tbl <- read_csv(file = "data/infection_meta.csv", col_names = TRUE, col_types = cols)

# Use default head() parameters
head(infection_meta.tbl)
## # A tibble: 6 x 29
##   experiment            experimenter description `Infection Date` `Plate Number`
##   <chr>                 <chr>        <chr>                  <dbl>          <dbl>
## 1 190426_VC20019_LUAm1~ CM           Wild isola~           190423              1
## 2 190426_VC20019_LUAm1~ CM           Wild isola~           190423              2
## 3 190426_VC20019_LUAm1~ CM           Wild isola~           190423              3
## 4 190426_N2_LUAm1_0M_7~ CM           Wild isola~           190423              4
## 5 190426_N2_LUAm1_10M_~ CM           Wild isola~           190423              5
## 6 190426_N2_LUAm1_20M_~ CM           Wild isola~           190423              6
## # i 24 more variables: Worm_strain <chr>, `Total Worms` <dbl>,
## #   `Spore Strain` <chr>, `Spore Lot` <chr>, `Lot concentration` <dbl>,
## #   `Total Spores (M)` <dbl>, `Total ul spore` <dbl>, `Infection Round` <dbl>,
## #   `40X OP50 (mL)` <dbl>, `Plate Size` <dbl>, `Spores(M)/cm2` <dbl>,
## #   `Time plated` <dbl>, `Time Incubated` <dbl>, Temp <dbl>, timepoint <chr>,
## #   infection.type <chr>, `Fixing Date` <dbl>, Location <chr>,
## #   `Staining Date` <dbl>, `Stain type` <chr>, `Slide date` <dbl>, ...
# Pull just the first 3 rows
head(infection_meta.tbl, 3)
## # A tibble: 3 x 29
##   experiment            experimenter description `Infection Date` `Plate Number`
##   <chr>                 <chr>        <chr>                  <dbl>          <dbl>
## 1 190426_VC20019_LUAm1~ CM           Wild isola~           190423              1
## 2 190426_VC20019_LUAm1~ CM           Wild isola~           190423              2
## 3 190426_VC20019_LUAm1~ CM           Wild isola~           190423              3
## # i 24 more variables: Worm_strain <chr>, `Total Worms` <dbl>,
## #   `Spore Strain` <chr>, `Spore Lot` <chr>, `Lot concentration` <dbl>,
## #   `Total Spores (M)` <dbl>, `Total ul spore` <dbl>, `Infection Round` <dbl>,
## #   `40X OP50 (mL)` <dbl>, `Plate Size` <dbl>, `Spores(M)/cm2` <dbl>,
## #   `Time plated` <dbl>, `Time Incubated` <dbl>, Temp <dbl>, timepoint <chr>,
## #   infection.type <chr>, `Fixing Date` <dbl>, Location <chr>,
## #   `Staining Date` <dbl>, `Stain type` <chr>, `Slide date` <dbl>, ...

3.1.2 Use tail() to view the latter portion of your data

Likewise, to inspect the last rows, you can use the tail() function. Again, you can decide on how many rows from the end of your object that you’d like to see. Note that this still displays in the original order of the data frame rather than reverse.

# Let's pull up the last 10 rows to look at!
tail(infection_meta.tbl, 10)
## # A tibble: 10 x 29
##    experiment           experimenter description `Infection Date` `Plate Number`
##    <chr>                <chr>        <chr>                  <dbl>          <dbl>
##  1 200916_JU1400_ERTm5~ CM           NIL tests ~           200912             13
##  2 200916_JU1400_ERTm5~ CM           NIL tests ~           200912             14
##  3 200918_N2_ERTm5_0M_~ CM           NIL tests ~           200915              1
##  4 200918_N2_ERTm5_3.5~ CM           NIL tests ~           200915              2
##  5 200918_JU1400_ERTm5~ CM           NIL tests ~           200915              3
##  6 200918_JU1400_ERTm5~ CM           NIL tests ~           200915              4
##  7 200918_AWR144_ERTm5~ CM           NIL tests ~           200915              5
##  8 200918_AWR144_ERTm5~ CM           NIL tests ~           200915              6
##  9 200918_AWR145_ERTm5~ CM           NIL tests ~           200915              7
## 10 200918_AWR145_ERTm5~ CM           NIL tests ~           200915              8
## # i 24 more variables: Worm_strain <chr>, `Total Worms` <dbl>,
## #   `Spore Strain` <chr>, `Spore Lot` <chr>, `Lot concentration` <dbl>,
## #   `Total Spores (M)` <dbl>, `Total ul spore` <dbl>, `Infection Round` <dbl>,
## #   `40X OP50 (mL)` <dbl>, `Plate Size` <dbl>, `Spores(M)/cm2` <dbl>,
## #   `Time plated` <dbl>, `Time Incubated` <dbl>, Temp <dbl>, timepoint <chr>,
## #   infection.type <chr>, `Fixing Date` <dbl>, Location <chr>,
## #   `Staining Date` <dbl>, `Stain type` <chr>, `Slide date` <dbl>, ...

3.1.3 Use unique() to retrieve a list of the unique elements within an object

You may be interested in knowing more about the data set you’re working with such as “How many C. elegans strains or microsporidia strains are we working with across these experiments? Recall that we have columns labeled Worm_strain and Spore Strain within our data set. Don’t worry, we’ll learn more about simplifying our column names later!

You could extract the whole column and scan through it or look at just a portion of it.

# Recall: Use the $ sign to access named columns within your data.frame!

infection_meta.tbl$Worm_strain
##   [1] "VC20019"     "VC20019"     "VC20019"     "N2"          "N2"         
##   [6] "N2"          "AB1"         "AB1"         "AB1"         "JU397"      
##  [11] "JU397"       "JU397"       "JU642"       "JU642"       "JU642"      
##  [16] "MY6"         "MY6"         "MY6"         "ED3042"      "ED3042"     
##  [21] "ED3042"      "JU360"       "JU360"       "JU360"       "JU1400"     
##  [26] "JU1400"      "JU1400"      "MY1"         "MY1"         "MY1"        
##  [31] "Lua1"        "Lua1"        "Lua1"        "VC20019"     "VC20019"    
##  [36] "VC20019"     "N2"          "N2"          "N2"          "CB4856"     
##  [41] "CB4856"      "CB4856"      "JU300"       "JU300"       "JU300"      
##  [46] "JU1400"      "JU1400"      "JU1400"      "MY2"         "MY2"        
##  [51] "MY2"         "VC20019"     "VC20019"     "VC20019"     "N2"         
##  [56] "N2"          "N2"          "JU360"       "JU360"       "JU360"      
##  [61] "MY2"         "MY2"         "MY2"         "N2"          "Lua1"       
##  [66] "JU1400"      "N2"          "Lua1"        "JU1400"      "N2"         
##  [71] "Lua1"        "JU1400"      "N2"          "Lua1"        "JU1400"     
##  [76] "N2"          "N2"          "JU1400"      "N2"          "JU1400"     
##  [81] "N2"          "JU1400"      "N2"          "JU1400"      "N2"         
##  [86] "JU1400"      "N2"          "JU1400"      "N2"          "JU1400"     
##  [91] "N2"          "JU1400"      "VC20019"     "VC20019"     "JU1400"     
##  [96] "VC20019"     "JU1400"      "VC20019"     "JU1400"      "VC20019"    
## [101] "JU1400"      "VC20019"     "JU1400"      "VC20019"     "JU1400"     
## [106] "VC20019"     "JU1400"      "VC20019"     "JU1400"      "VC20019"    
## [111] "JU1400"      "N2"          "N2"          "JU1400"      "JU1400"     
## [116] "ED3052A"     "ED3052A"     "ED3052B"     "ED3052B"     "MY1"        
## [121] "MY1"         "N2"          "JU1400"      "N2"          "JU1400"     
## [126] "N2"          "JU1400"      "N2"          "N2"          "JU1400"     
## [131] "JU1400"      "MY1"         "MY1"         "N2"          "N2"         
## [136] "JU1400"      "JU1400"      "ED3052A"     "ED3052A"     "ED3052B"    
## [141] "ED3052B"     "MY1"         "MY1"         "N2"          "JU1400"     
## [146] "N2"          "JU1400"      "N2"          "JU1400"      "N2"         
## [151] "N2"          "JU1400"      "JU1400"      "MY1"         "MY1"        
## [156] "N2"          "JU1400"      "ED3052A"     "ED3052B"     "MY1"        
## [161] "N2"          "JU1400"      "N2"          "N2"          "JU1400"     
## [166] "JU1400"      "ED3052A"     "ED3052A"     "ED3052B"     "ED3052B"    
## [171] "MY1"         "MY1"         "N2"          "JU1400"      "N2"         
## [176] "JU1400"      "N2"          "JU1400"      "N2"          "N2"         
## [181] "JU1400"      "JU1400"      "MY1"         "MY1"         "N2"         
## [186] "JU1400"      "N2"          "N2"          "JU1400"      "JU1400"     
## [191] "VC40171"     "VC40171"     "AWR144"      "AWR144"      "AWR145"     
## [196] "AWR145"      "N2"          "JU1400"      "N2"          "JU1400"     
## [201] "N2"          "JU1400"      "N2"          "JU1400"      "N2"         
## [206] "JU1400"      "N2"          "JU1400"      "N2"          "JU1400"     
## [211] "N2"          "N2"          "JU1400"      "JU1400"      "VC40171"    
## [216] "VC40171"     "AWR144"      "AWR144"      "AWR145"      "AWR145"     
## [221] "N2-rep1"     "JU1400-rep1" "N2-rep1"     "JU1400-rep1" "N2-rep1"    
## [226] "JU1400-rep1" "N2-rep1"     "JU1400-rep1" "N2"          "N2"         
## [231] "JU1400"      "JU1400"      "VC40171"     "VC40171"     "AWR144"     
## [236] "AWR144"      "AWR145"      "AWR145"      "N2"          "JU1400"     
## [241] "AWR144"      "AWR145"      "N2"          "N2"          "N2"         
## [246] "JU1400"      "JU1400"      "JU1400"      "N2"          "N2"         
## [251] "JU1400"      "JU1400"      "N2"          "JU1400"      "N2"         
## [256] "N2"          "JU1400"      "JU1400"      "AWR144"      "AWR144"     
## [261] "AWR145"      "AWR145"      "N2"          "N2"          "N2"         
## [266] "JU1400"      "JU1400"      "JU1400"      "N2"          "N2"         
## [271] "JU1400"      "JU1400"      "AWR144"      "AWR144"      "AWR145"     
## [276] "AWR145"

As you may have noticed, this method printed the entire Worm_strain column. While it may be useful information for certain aspects, it doesn’t answer our main question of how many different nematode strains were used across our experiments.

The function unique() can help us answer this question by removing duplicated entries, thus living up to its name. It can take in a number of different objects but usually returns an object of the same type that it was given as input.

Let’s take a look at using it on our question.

# Retrieve a list of unique genera from our data set
unique(infection_meta.tbl$Worm_strain)
##  [1] "VC20019"     "N2"          "AB1"         "JU397"       "JU642"      
##  [6] "MY6"         "ED3042"      "JU360"       "JU1400"      "MY1"        
## [11] "Lua1"        "CB4856"      "JU300"       "MY2"         "ED3052A"    
## [16] "ED3052B"     "VC40171"     "AWR144"      "AWR145"      "N2-rep1"    
## [21] "JU1400-rep1"

3.1.4 Review: use length() or str() to retrieve the size of some objects

Note from above that we have only one entry per strain, but how many strains are there in total? Recall from Lecture 01 we used the length() function which does just as it implies by returning the length of a vector, list, or factor. You can also use it to set the length of those objects but it’s not something we have reason to do.

On the other hand str() always gives us the same kind of information plus a little more. Later on, we’ll see that more isn’t always better and that using length() has its advantages.

# Two ways to see how many unique entries we have
# ?length
length(unique(infection_meta.tbl$Worm_strain))
## [1] 21
# or

str(unique(infection_meta.tbl$Worm_strain))
##  chr [1:21] "VC20019" "N2" "AB1" "JU397" "JU642" "MY6" "ED3042" "JU360" ...

Using unique() we are returned a character vector containing 21 C. elegans strains. As you can see a funciton like length() returns a simple vector value which can become very helpful from a programmatic standpoint. The str() function, on the other hand returns much more human-readable information but is not readily useable as input for other functions.

3.1.5 glimpse() and View() show us our data

Suppose we want to see more of our data frame. There are a couple of choices that can be used inside of RStudio. In this IDE, you have access to your Environment pane which can give you a quick idea of values for variables in your environment, including a bit of what your tibble or data.frame looks like.

Clicking on a data object like infection_meta.tbl will generate a new tab that shows your entire tibble in a human-readable format similar to an Excel spreadsheet. The same result can be accomplished by using the view command View(infection_meta.tbl).

The glimpse() command comes from the dplyr package and brings up a comprehensive summary of your object that looks very similar to the information provided in the Environment pane. You’ll find it looks very much like the str() command but is formatted in a more human-readable way. It tries to provide as much information as possible in a small amount of space.

We can use this command in a code cell so let’s take a glimpse at glimpse().

View(infection_meta.tbl)
# Only works in RStudio

# Let's compare str() to glimpse()
str(infection_meta.tbl)
## spc_tbl_ [276 x 29] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ experiment       : chr [1:276] "190426_VC20019_LUAm1_0M_72hpi" "190426_VC20019_LUAm1_10M_72hpi" "190426_VC20019_LUAm1_20M_72hpi" "190426_N2_LUAm1_0M_72hpi" ...
##  $ experimenter     : chr [1:276] "CM" "CM" "CM" "CM" ...
##  $ description      : chr [1:276] "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" ...
##  $ Infection Date   : num [1:276] 190423 190423 190423 190423 190423 ...
##  $ Plate Number     : num [1:276] 1 2 3 4 5 6 7 8 9 10 ...
##  $ Worm_strain      : chr [1:276] "VC20019" "VC20019" "VC20019" "N2" ...
##  $ Total Worms      : num [1:276] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
##  $ Spore Strain     : chr [1:276] "LUAm1" "LUAm1" "LUAm1" "LUAm1" ...
##  $ Spore Lot        : chr [1:276] "2A" "2A" "2A" "2A" ...
##  $ Lot concentration: num [1:276] 176000 176000 176000 176000 176000 176000 176000 176000 176000 176000 ...
##  $ Total Spores (M) : num [1:276] 0 10 20 0 10 20 0 10 20 0 ...
##  $ Total ul spore   : num [1:276] 0 56.8 113.6 0 56.8 ...
##  $ Infection Round  : num [1:276] 1 1 1 1 1 1 1 1 1 1 ...
##  $ 40X OP50 (mL)    : num [1:276] 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 ...
##  $ Plate Size       : num [1:276] 6 6 6 6 6 6 6 6 6 6 ...
##  $ Spores(M)/cm2    : num [1:276] 0 0.354 0.708 0 0.354 ...
##  $ Time plated      : num [1:276] 1300 1300 1300 1300 1300 1300 1300 1300 1300 1300 ...
##  $ Time Incubated   : num [1:276] 1600 1600 1600 1600 1600 1600 1600 1600 1600 1600 ...
##  $ Temp             : num [1:276] 21 21 21 21 21 21 21 21 21 21 ...
##  $ timepoint        : chr [1:276] "72" "72" "72" "72" ...
##  $ infection.type   : chr [1:276] "continuous" "continuous" "continuous" "continuous" ...
##  $ Fixing Date      : num [1:276] 190426 190426 190426 190426 190426 ...
##  $ Location         : chr [1:276] "Sample exhausted" "Sample exhausted" "Sample exhausted" "Sample exhausted" ...
##  $ Staining Date    : num [1:276] 190513 190513 190513 190430 190513 ...
##  $ Stain type       : chr [1:276] "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "DY96" ...
##  $ Slide date       : num [1:276] 190515 190515 190515 190501 190515 ...
##  $ Slide number     : num [1:276] 1 2 3 4 5 6 7 8 9 10 ...
##  $ Slide Box        : num [1:276] 2 2 2 2 2 2 2 2 2 2 ...
##  $ Imaging Date     : num [1:276] 190516 190516 190516 190502 190516 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   experiment = col_character(),
##   ..   experimenter = col_character(),
##   ..   description = col_character(),
##   ..   `Infection Date` = col_double(),
##   ..   `Plate Number` = col_double(),
##   ..   Worm_strain = col_character(),
##   ..   `Total Worms` = col_double(),
##   ..   `Spore Strain` = col_character(),
##   ..   `Spore Lot` = col_character(),
##   ..   `Lot concentration` = col_double(),
##   ..   `Total Spores (M)` = col_double(),
##   ..   `Total ul spore` = col_double(),
##   ..   `Infection Round` = col_double(),
##   ..   `40X OP50 (mL)` = col_double(),
##   ..   `Plate Size` = col_double(),
##   ..   `Spores(M)/cm2` = col_double(),
##   ..   `Time plated` = col_double(),
##   ..   `Time Incubated` = col_double(),
##   ..   Temp = col_double(),
##   ..   timepoint = col_character(),
##   ..   infection.type = col_character(),
##   ..   `Fixing Date` = col_double(),
##   ..   Location = col_character(),
##   ..   `Staining Date` = col_double(),
##   ..   `Stain type` = col_character(),
##   ..   `Slide date` = col_double(),
##   ..   `Slide number` = col_double(),
##   ..   `Slide Box` = col_double(),
##   ..   `Imaging Date` = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
# glimpse gives us less information overall but is also less redundant
glimpse(infection_meta.tbl)
## Rows: 276
## Columns: 29
## $ experiment          <chr> "190426_VC20019_LUAm1_0M_72hpi", "190426_VC20019_L~
## $ experimenter        <chr> "CM", "CM", "CM", "CM", "CM", "CM", "CM", "CM", "C~
## $ description         <chr> "Wild isolate phenoMIP retest", "Wild isolate phen~
## $ `Infection Date`    <dbl> 190423, 190423, 190423, 190423, 190423, 190423, 19~
## $ `Plate Number`      <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,~
## $ Worm_strain         <chr> "VC20019", "VC20019", "VC20019", "N2", "N2", "N2",~
## $ `Total Worms`       <dbl> 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 10~
## $ `Spore Strain`      <chr> "LUAm1", "LUAm1", "LUAm1", "LUAm1", "LUAm1", "LUAm~
## $ `Spore Lot`         <chr> "2A", "2A", "2A", "2A", "2A", "2A", "2A", "2A", "2~
## $ `Lot concentration` <dbl> 176000, 176000, 176000, 176000, 176000, 176000, 17~
## $ `Total Spores (M)`  <dbl> 0, 10, 20, 0, 10, 20, 0, 10, 20, 0, 10, 20, 0, 10,~
## $ `Total ul spore`    <dbl> 0.00000, 56.81818, 113.63636, 0.00000, 56.81818, 1~
## $ `Infection Round`   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,~
## $ `40X OP50 (mL)`     <dbl> 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.~
## $ `Plate Size`        <dbl> 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,~
## $ `Spores(M)/cm2`     <dbl> 0.0000000, 0.3538570, 0.7077141, 0.0000000, 0.3538~
## $ `Time plated`       <dbl> 1300, 1300, 1300, 1300, 1300, 1300, 1300, 1300, 13~
## $ `Time Incubated`    <dbl> 1600, 1600, 1600, 1600, 1600, 1600, 1600, 1600, 16~
## $ Temp                <dbl> 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21~
## $ timepoint           <chr> "72", "72", "72", "72", "72", "72", "72", "72", "7~
## $ infection.type      <chr> "continuous", "continuous", "continuous", "continu~
## $ `Fixing Date`       <dbl> 190426, 190426, 190426, 190426, 190426, 190426, 19~
## $ Location            <chr> "Sample exhausted", "Sample exhausted", "Sample ex~
## $ `Staining Date`     <dbl> 190513, 190513, 190513, 190430, 190513, 190513, 19~
## $ `Stain type`        <chr> "Sp.9 FISH + DY96", "Sp.9 FISH + DY96", "Sp.9 FISH~
## $ `Slide date`        <dbl> 190515, 190515, 190515, 190501, 190515, 190515, 19~
## $ `Slide number`      <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,~
## $ `Slide Box`         <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,~
## $ `Imaging Date`      <dbl> 190516, 190516, 190516, 190502, 190516, 190516, 19~

So the information provided by glimpse() is more sparse, the formatting is a little tighter and we don’t have to see the extra column attribute information as with str(), which can save a lot of vertical space. On the other hand, the command takes longer to type but that’s a personal choice.

How does dplyr handle column names with spaces? Look at the output from glimpse() above versus our use of str(). The use of glimpse() gives us another peek under the hood by showing us the the true names of the columns. Recall we emphasized that whitespace helps the R interpreter to recognize certain code switches.

In order to access column names through methods like the $ indexing method, we can’t normally accept spaces in names. To get around these limitations, the tibble actually uses the grave accent (`) diacritical (AKA a back-tick) on both sides of the column name (when necessary). This key is located just to the left of the “1” key along with the “~” symbol.

So if we wanted to access a column like “Fixing Date” we would actually need to use $`Fixing Date` instead! The same idea will apply later when we start working with functions from the dplyr package.


3.2.0 Special data: NA and NaN values

What happens when you import data with missing values? These could be empty entries in a CSV file or blank cells in a xlsx file. Perhaps, as we’ll see later it could be a specifically annotated entry like “No_Data”. These are usually the result of missing data points from an experiment but could have origins in other reasons like low-threshold values depending on the source of your data.

Missing values in R are handled as NA or (Not Available). Impossible values (like the results of dividing by zero) are represented by NaN (Not a Number). These types of values can be considered null values. These two types of values, especially NAs, have special ways to be dealt with otherwise it may lead to errors in functions that we frequently use.

Let us begin by building an example containing NA values.

# Set up some vectors for a data.frame
modern_domain <- c("Archaea", "Bacteria", "Eukarya", NA, NA)
five_domains <- c("Archaea", "Bacteria", "Eukarya", "Virusobiota", "Prionobiota")
six_kingdoms <- c(1, 1, 4, NA, NA)

# Put it all together with in a call to data.frame()
NA_example <- data.frame(five_domains, modern_domain, six_kingdoms)

# Look at our data frame
NA_example
##   five_domains modern_domain six_kingdoms
## 1      Archaea       Archaea            1
## 2     Bacteria      Bacteria            1
## 3      Eukarya       Eukarya            4
## 4  Virusobiota          <NA>           NA
## 5  Prionobiota          <NA>           NA

3.2.1 Some functions can be told to ignore or remove NA values

R will not abide an NA value when completing a calculation. If it does encounter an NA then it will return an NA. Some mathematical functions, however, can ignore NA values by explicitly setting the logical parameter na.rm = TRUE. Under the hood, if the function recognizes this parameter, it will remove the NA values before proceeding to perform its mathematical operation.

IF you fail to set this parameter correctly, then the function may return an NA value.

# Use the mean() function and see what happens with NA values
sum(six_kingdoms) # some functions need to be explicitly told what to do with NAs. No errors though!
## [1] NA
sum(six_kingdoms, na.rm = TRUE) #Avoid using just "T" as an abbreviation for "TRUE"
## [1] 6

3.2.2 What happens when we try to use functions via apply() on data with NAs?

Let’s recreate the counts data from Lecture 01 and add a few NAs. If I now use the apply() function to calculate the mean number of counts across each row (ie genes), I will get NA as an answer for the rows that had NAs.

counts <- data.frame(Site1 = c(geneA = 2, geneB = 4, geneC = 12, geneD = 8),
                     Site2 = c(geneA = 15, geneB = NA, geneC = 27, geneD = 28),
                     Site3 = c(geneA = 10, geneB = 7, geneC = 13, geneD = NA))

counts
##       Site1 Site2 Site3
## geneA     2    15    10
## geneB     4    NA     7
## geneC    12    27    13
## geneD     8    28    NA
# Notice that we can only pass the function name "mean" and not any parameters
apply(X = counts, MARGIN = 1, FUN = mean)
##    geneA    geneB    geneC    geneD 
##  9.00000       NA 17.33333       NA

Recall: we can pass additional parameters to apply() that are meant as parameters for our function FUN. So all we have to do is update the code appropriately to include the ‘na.rm=TRUE’ parameter.

# Pass parameters in our call
apply(X = counts, MARGIN = 1, 
      FUN = mean, na.rm = TRUE)
##    geneA    geneB    geneC    geneD 
##  9.00000  5.50000 17.33333 18.00000
# Equivalent code - perhaps clearer but more verbose
apply(X = counts, MARGIN = 1, 
      FUN = function(x) mean(x, na.rm = TRUE))
##    geneA    geneB    geneC    geneD 
##  9.00000  5.50000 17.33333 18.00000

3.2.3 Use the is.na() function to check your data

How do we find out ahead of time that we are missing data? Knowing is half the battle and is.na() can help us determine this with some data structures. The is.na() function can search through data structures and return a logical data structure of the same dimensions.

With a vector we can easily see how some basic functions work.

# Let's check out this vector that contains NA values
na_vector <- c(5, 6, NA, 7, 7, NA)

# This works on vectors...
is.na(na_vector)
## [1] FALSE FALSE  TRUE FALSE FALSE  TRUE
# and data.frames too!
is.na(counts)
##       Site1 Site2 Site3
## geneA FALSE FALSE FALSE
## geneB FALSE  TRUE FALSE
## geneC FALSE FALSE FALSE
## geneD FALSE FALSE  TRUE
# Let's look at our infection metadata for na values
is.na(infection_meta.tbl)
##        experiment experimenter description Infection Date Plate Number
##   [1,]      FALSE        FALSE       FALSE          FALSE        FALSE
##   [2,]      FALSE        FALSE       FALSE          FALSE        FALSE
##   [3,]      FALSE        FALSE       FALSE          FALSE        FALSE
##   [4,]      FALSE        FALSE       FALSE          FALSE        FALSE
##   [5,]      FALSE        FALSE       FALSE          FALSE        FALSE
##   [6,]      FALSE        FALSE       FALSE          FALSE        FALSE
##   [7,]      FALSE        FALSE       FALSE          FALSE        FALSE
##   [8,]      FALSE        FALSE       FALSE          FALSE        FALSE
##   [9,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [10,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [11,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [12,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [13,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [14,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [15,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [16,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [17,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [18,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [19,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [20,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [21,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [22,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [23,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [24,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [25,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [26,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [27,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [28,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [29,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [30,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [31,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [32,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [33,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [34,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [35,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [36,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [37,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [38,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [39,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [40,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [41,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [42,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [43,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [44,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [45,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [46,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [47,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [48,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [49,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [50,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [51,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [52,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [53,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [54,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [55,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [56,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [57,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [58,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [59,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [60,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [61,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [62,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [63,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [64,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [65,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [66,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [67,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [68,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [69,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [70,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [71,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [72,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [73,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [74,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [75,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [76,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [77,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [78,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [79,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [80,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [81,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [82,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [83,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [84,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [85,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [86,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [87,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [88,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [89,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [90,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [91,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [92,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [93,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [94,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [95,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [96,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [97,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [98,]      FALSE        FALSE       FALSE          FALSE        FALSE
##  [99,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [100,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [101,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [102,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [103,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [104,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [105,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [106,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [107,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [108,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [109,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [110,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [111,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [112,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [113,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [114,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [115,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [116,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [117,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [118,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [119,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [120,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [121,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [122,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [123,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [124,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [125,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [126,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [127,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [128,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [129,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [130,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [131,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [132,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [133,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [134,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [135,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [136,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [137,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [138,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [139,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [140,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [141,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [142,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [143,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [144,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [145,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [146,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [147,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [148,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [149,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [150,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [151,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [152,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [153,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [154,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [155,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [156,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [157,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [158,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [159,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [160,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [161,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [162,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [163,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [164,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [165,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [166,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [167,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [168,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [169,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [170,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [171,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [172,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [173,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [174,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [175,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [176,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [177,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [178,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [179,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [180,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [181,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [182,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [183,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [184,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [185,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [186,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [187,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [188,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [189,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [190,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [191,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [192,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [193,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [194,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [195,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [196,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [197,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [198,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [199,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [200,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [201,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [202,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [203,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [204,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [205,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [206,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [207,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [208,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [209,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [210,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [211,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [212,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [213,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [214,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [215,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [216,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [217,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [218,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [219,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [220,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [221,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [222,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [223,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [224,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [225,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [226,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [227,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [228,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [229,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [230,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [231,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [232,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [233,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [234,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [235,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [236,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [237,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [238,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [239,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [240,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [241,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [242,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [243,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [244,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [245,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [246,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [247,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [248,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [249,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [250,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [251,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [252,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [253,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [254,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [255,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [256,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [257,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [258,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [259,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [260,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [261,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [262,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [263,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [264,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [265,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [266,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [267,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [268,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [269,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [270,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [271,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [272,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [273,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [274,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [275,]      FALSE        FALSE       FALSE          FALSE        FALSE
## [276,]      FALSE        FALSE       FALSE          FALSE        FALSE
##        Worm_strain Total Worms Spore Strain Spore Lot Lot concentration
##   [1,]       FALSE       FALSE        FALSE     FALSE             FALSE
##   [2,]       FALSE       FALSE        FALSE     FALSE             FALSE
##   [3,]       FALSE       FALSE        FALSE     FALSE             FALSE
##   [4,]       FALSE       FALSE        FALSE     FALSE             FALSE
##   [5,]       FALSE       FALSE        FALSE     FALSE             FALSE
##   [6,]       FALSE       FALSE        FALSE     FALSE             FALSE
##   [7,]       FALSE       FALSE        FALSE     FALSE             FALSE
##   [8,]       FALSE       FALSE        FALSE     FALSE             FALSE
##   [9,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [10,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [11,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [12,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [13,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [14,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [15,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [16,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [17,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [18,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [19,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [20,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [21,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [22,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [23,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [24,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [25,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [26,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [27,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [28,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [29,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [30,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [31,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [32,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [33,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [34,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [35,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [36,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [37,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [38,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [39,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [40,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [41,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [42,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [43,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [44,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [45,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [46,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [47,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [48,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [49,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [50,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [51,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [52,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [53,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [54,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [55,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [56,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [57,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [58,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [59,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [60,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [61,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [62,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [63,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [64,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [65,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [66,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [67,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [68,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [69,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [70,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [71,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [72,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [73,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [74,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [75,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [76,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [77,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [78,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [79,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [80,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [81,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [82,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [83,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [84,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [85,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [86,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [87,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [88,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [89,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [90,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [91,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [92,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [93,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [94,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [95,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [96,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [97,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [98,]       FALSE       FALSE        FALSE     FALSE             FALSE
##  [99,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [100,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [101,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [102,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [103,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [104,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [105,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [106,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [107,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [108,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [109,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [110,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [111,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [112,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [113,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [114,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [115,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [116,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [117,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [118,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [119,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [120,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [121,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [122,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [123,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [124,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [125,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [126,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [127,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [128,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [129,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [130,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [131,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [132,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [133,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [134,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [135,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [136,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [137,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [138,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [139,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [140,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [141,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [142,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [143,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [144,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [145,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [146,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [147,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [148,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [149,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [150,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [151,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [152,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [153,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [154,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [155,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [156,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [157,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [158,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [159,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [160,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [161,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [162,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [163,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [164,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [165,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [166,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [167,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [168,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [169,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [170,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [171,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [172,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [173,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [174,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [175,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [176,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [177,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [178,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [179,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [180,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [181,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [182,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [183,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [184,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [185,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [186,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [187,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [188,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [189,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [190,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [191,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [192,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [193,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [194,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [195,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [196,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [197,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [198,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [199,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [200,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [201,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [202,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [203,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [204,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [205,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [206,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [207,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [208,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [209,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [210,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [211,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [212,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [213,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [214,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [215,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [216,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [217,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [218,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [219,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [220,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [221,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [222,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [223,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [224,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [225,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [226,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [227,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [228,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [229,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [230,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [231,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [232,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [233,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [234,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [235,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [236,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [237,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [238,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [239,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [240,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [241,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [242,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [243,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [244,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [245,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [246,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [247,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [248,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [249,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [250,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [251,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [252,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [253,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [254,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [255,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [256,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [257,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [258,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [259,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [260,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [261,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [262,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [263,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [264,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [265,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [266,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [267,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [268,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [269,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [270,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [271,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [272,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [273,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [274,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [275,]       FALSE       FALSE        FALSE     FALSE             FALSE
## [276,]       FALSE       FALSE        FALSE     FALSE             FALSE
##        Total Spores (M) Total ul spore Infection Round 40X OP50 (mL) Plate Size
##   [1,]            FALSE          FALSE           FALSE         FALSE      FALSE
##   [2,]            FALSE          FALSE           FALSE         FALSE      FALSE
##   [3,]            FALSE          FALSE           FALSE         FALSE      FALSE
##   [4,]            FALSE          FALSE           FALSE         FALSE      FALSE
##   [5,]            FALSE          FALSE           FALSE         FALSE      FALSE
##   [6,]            FALSE          FALSE           FALSE         FALSE      FALSE
##   [7,]            FALSE          FALSE           FALSE         FALSE      FALSE
##   [8,]            FALSE          FALSE           FALSE         FALSE      FALSE
##   [9,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [10,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [11,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [12,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [13,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [14,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [15,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [16,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [17,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [18,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [19,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [20,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [21,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [22,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [23,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [24,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [25,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [26,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [27,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [28,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [29,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [30,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [31,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [32,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [33,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [34,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [35,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [36,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [37,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [38,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [39,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [40,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [41,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [42,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [43,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [44,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [45,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [46,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [47,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [48,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [49,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [50,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [51,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [52,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [53,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [54,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [55,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [56,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [57,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [58,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [59,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [60,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [61,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [62,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [63,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [64,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [65,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [66,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [67,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [68,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [69,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [70,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [71,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [72,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [73,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [74,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [75,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [76,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [77,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [78,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [79,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [80,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [81,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [82,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [83,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [84,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [85,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [86,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [87,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [88,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [89,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [90,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [91,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [92,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [93,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [94,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [95,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [96,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [97,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [98,]            FALSE          FALSE           FALSE         FALSE      FALSE
##  [99,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [100,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [101,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [102,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [103,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [104,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [105,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [106,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [107,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [108,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [109,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [110,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [111,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [112,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [113,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [114,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [115,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [116,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [117,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [118,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [119,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [120,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [121,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [122,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [123,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [124,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [125,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [126,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [127,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [128,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [129,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [130,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [131,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [132,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [133,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [134,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [135,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [136,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [137,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [138,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [139,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [140,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [141,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [142,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [143,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [144,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [145,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [146,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [147,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [148,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [149,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [150,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [151,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [152,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [153,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [154,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [155,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [156,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [157,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [158,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [159,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [160,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [161,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [162,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [163,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [164,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [165,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [166,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [167,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [168,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [169,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [170,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [171,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [172,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [173,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [174,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [175,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [176,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [177,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [178,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [179,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [180,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [181,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [182,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [183,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [184,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [185,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [186,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [187,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [188,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [189,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [190,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [191,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [192,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [193,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [194,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [195,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [196,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [197,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [198,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [199,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [200,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [201,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [202,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [203,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [204,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [205,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [206,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [207,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [208,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [209,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [210,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [211,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [212,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [213,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [214,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [215,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [216,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [217,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [218,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [219,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [220,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [221,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [222,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [223,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [224,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [225,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [226,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [227,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [228,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [229,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [230,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [231,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [232,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [233,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [234,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [235,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [236,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [237,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [238,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [239,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [240,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [241,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [242,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [243,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [244,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [245,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [246,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [247,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [248,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [249,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [250,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [251,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [252,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [253,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [254,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [255,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [256,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [257,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [258,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [259,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [260,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [261,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [262,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [263,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [264,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [265,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [266,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [267,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [268,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [269,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [270,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [271,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [272,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [273,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [274,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [275,]            FALSE          FALSE           FALSE         FALSE      FALSE
## [276,]            FALSE          FALSE           FALSE         FALSE      FALSE
##        Spores(M)/cm2 Time plated Time Incubated  Temp timepoint infection.type
##   [1,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##   [2,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##   [3,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##   [4,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##   [5,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##   [6,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##   [7,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##   [8,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##   [9,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [10,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [11,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [12,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [13,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [14,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [15,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [16,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [17,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [18,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [19,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [20,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [21,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [22,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [23,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [24,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [25,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [26,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [27,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [28,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [29,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [30,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [31,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [32,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [33,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [34,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [35,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [36,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [37,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [38,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [39,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [40,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [41,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [42,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [43,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [44,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [45,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [46,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [47,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [48,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [49,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [50,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [51,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [52,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [53,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [54,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [55,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [56,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [57,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [58,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [59,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [60,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [61,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [62,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [63,]         FALSE       FALSE          FALSE FALSE     FALSE          FALSE
##  [64,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [65,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [66,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [67,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [68,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [69,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [70,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [71,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [72,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [73,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [74,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [75,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [76,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [77,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [78,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [79,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [80,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [81,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [82,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [83,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [84,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [85,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [86,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [87,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [88,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [89,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [90,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [91,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [92,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [93,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [94,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [95,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [96,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [97,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [98,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
##  [99,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
## [100,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
## [101,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
## [102,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
## [103,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
## [104,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
## [105,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
## [106,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
## [107,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
## [108,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
## [109,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
## [110,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
## [111,]         FALSE        TRUE           TRUE FALSE     FALSE          FALSE
## [112,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [113,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [114,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [115,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [116,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [117,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [118,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [119,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [120,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [121,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [122,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [123,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [124,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [125,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [126,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [127,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [128,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [129,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [130,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [131,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [132,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [133,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [134,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [135,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [136,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [137,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [138,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [139,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [140,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [141,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [142,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [143,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [144,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [145,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [146,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [147,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [148,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [149,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [150,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [151,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [152,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [153,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [154,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [155,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [156,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [157,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [158,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [159,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [160,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [161,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [162,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [163,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [164,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [165,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [166,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [167,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [168,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [169,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [170,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [171,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [172,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [173,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [174,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [175,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [176,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [177,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [178,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [179,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [180,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [181,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [182,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [183,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [184,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [185,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [186,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [187,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [188,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [189,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [190,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [191,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [192,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [193,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [194,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [195,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [196,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [197,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [198,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [199,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [200,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [201,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [202,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [203,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [204,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [205,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [206,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [207,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [208,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [209,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [210,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [211,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [212,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [213,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [214,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [215,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [216,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [217,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [218,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [219,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [220,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [221,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [222,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [223,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [224,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [225,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [226,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [227,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [228,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [229,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [230,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [231,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [232,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [233,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [234,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [235,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [236,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [237,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [238,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [239,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [240,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [241,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [242,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [243,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [244,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [245,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [246,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [247,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [248,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [249,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [250,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [251,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [252,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [253,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [254,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [255,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [256,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [257,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [258,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [259,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [260,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [261,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [262,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [263,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [264,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [265,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [266,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [267,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [268,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [269,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [270,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [271,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [272,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [273,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [274,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [275,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
## [276,]         FALSE        TRUE          FALSE FALSE     FALSE          FALSE
##        Fixing Date Location Staining Date Stain type Slide date Slide number
##   [1,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##   [2,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##   [3,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##   [4,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##   [5,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##   [6,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##   [7,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##   [8,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##   [9,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [10,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [11,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [12,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [13,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [14,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [15,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [16,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [17,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [18,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [19,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [20,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [21,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [22,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [23,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [24,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [25,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [26,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [27,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [28,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [29,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [30,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [31,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [32,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [33,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [34,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [35,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [36,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [37,]       FALSE     TRUE         FALSE      FALSE      FALSE        FALSE
##  [38,]       FALSE     TRUE         FALSE      FALSE      FALSE        FALSE
##  [39,]       FALSE     TRUE         FALSE      FALSE      FALSE        FALSE
##  [40,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [41,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [42,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [43,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [44,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [45,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [46,]       FALSE     TRUE         FALSE      FALSE      FALSE        FALSE
##  [47,]       FALSE     TRUE         FALSE      FALSE      FALSE        FALSE
##  [48,]       FALSE     TRUE         FALSE      FALSE      FALSE        FALSE
##  [49,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [50,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [51,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [52,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [53,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [54,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [55,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [56,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [57,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [58,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [59,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [60,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [61,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [62,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [63,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [64,]       FALSE     TRUE         FALSE      FALSE      FALSE        FALSE
##  [65,]       FALSE     TRUE         FALSE      FALSE      FALSE        FALSE
##  [66,]       FALSE     TRUE         FALSE      FALSE      FALSE        FALSE
##  [67,]       FALSE     TRUE         FALSE      FALSE      FALSE        FALSE
##  [68,]       FALSE     TRUE         FALSE      FALSE      FALSE        FALSE
##  [69,]       FALSE     TRUE         FALSE      FALSE      FALSE        FALSE
##  [70,]       FALSE     TRUE         FALSE      FALSE      FALSE        FALSE
##  [71,]       FALSE     TRUE         FALSE      FALSE      FALSE        FALSE
##  [72,]       FALSE     TRUE         FALSE      FALSE      FALSE        FALSE
##  [73,]       FALSE     TRUE         FALSE      FALSE      FALSE        FALSE
##  [74,]       FALSE     TRUE         FALSE      FALSE      FALSE        FALSE
##  [75,]       FALSE     TRUE         FALSE      FALSE      FALSE        FALSE
##  [76,]       FALSE     TRUE         FALSE      FALSE      FALSE        FALSE
##  [77,]       FALSE     TRUE         FALSE      FALSE      FALSE        FALSE
##  [78,]       FALSE     TRUE         FALSE      FALSE      FALSE        FALSE
##  [79,]       FALSE     TRUE         FALSE      FALSE      FALSE        FALSE
##  [80,]       FALSE     TRUE         FALSE      FALSE      FALSE        FALSE
##  [81,]       FALSE     TRUE         FALSE      FALSE      FALSE        FALSE
##  [82,]       FALSE     TRUE         FALSE      FALSE      FALSE        FALSE
##  [83,]       FALSE     TRUE         FALSE      FALSE      FALSE        FALSE
##  [84,]       FALSE     TRUE         FALSE      FALSE      FALSE        FALSE
##  [85,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [86,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [87,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [88,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [89,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [90,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [91,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [92,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [93,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [94,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [95,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [96,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [97,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [98,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##  [99,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [100,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [101,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [102,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [103,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [104,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [105,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [106,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [107,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [108,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [109,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [110,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [111,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [112,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [113,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [114,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [115,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [116,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [117,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [118,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [119,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [120,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [121,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [122,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [123,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [124,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [125,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [126,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [127,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [128,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [129,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [130,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [131,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [132,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [133,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [134,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [135,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [136,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [137,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [138,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [139,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [140,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [141,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [142,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [143,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [144,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [145,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [146,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [147,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [148,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [149,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [150,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [151,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [152,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [153,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [154,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [155,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [156,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [157,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [158,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [159,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [160,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [161,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [162,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [163,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [164,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [165,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [166,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [167,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [168,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [169,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [170,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [171,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [172,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [173,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [174,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [175,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [176,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [177,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [178,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [179,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [180,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [181,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [182,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [183,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [184,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [185,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [186,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [187,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [188,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [189,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [190,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [191,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [192,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [193,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [194,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [195,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [196,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [197,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [198,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [199,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [200,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [201,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [202,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [203,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [204,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [205,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [206,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [207,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [208,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [209,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [210,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [211,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [212,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [213,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [214,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [215,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [216,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [217,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [218,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [219,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [220,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [221,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [222,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [223,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [224,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [225,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [226,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [227,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [228,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [229,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [230,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [231,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [232,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [233,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [234,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [235,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [236,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [237,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [238,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [239,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [240,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [241,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [242,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [243,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [244,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [245,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [246,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [247,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [248,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [249,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [250,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [251,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [252,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [253,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [254,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [255,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [256,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [257,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [258,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [259,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [260,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [261,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [262,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [263,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [264,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [265,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [266,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [267,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [268,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [269,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [270,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [271,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [272,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [273,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [274,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [275,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
## [276,]       FALSE    FALSE         FALSE      FALSE      FALSE        FALSE
##        Slide Box Imaging Date
##   [1,]     FALSE        FALSE
##   [2,]     FALSE        FALSE
##   [3,]     FALSE        FALSE
##   [4,]     FALSE        FALSE
##   [5,]     FALSE        FALSE
##   [6,]     FALSE        FALSE
##   [7,]     FALSE        FALSE
##   [8,]     FALSE        FALSE
##   [9,]     FALSE        FALSE
##  [10,]     FALSE        FALSE
##  [11,]     FALSE        FALSE
##  [12,]     FALSE        FALSE
##  [13,]     FALSE        FALSE
##  [14,]     FALSE        FALSE
##  [15,]     FALSE        FALSE
##  [16,]     FALSE        FALSE
##  [17,]     FALSE        FALSE
##  [18,]     FALSE        FALSE
##  [19,]     FALSE        FALSE
##  [20,]     FALSE        FALSE
##  [21,]     FALSE        FALSE
##  [22,]     FALSE        FALSE
##  [23,]     FALSE        FALSE
##  [24,]     FALSE        FALSE
##  [25,]     FALSE        FALSE
##  [26,]     FALSE        FALSE
##  [27,]     FALSE        FALSE
##  [28,]     FALSE        FALSE
##  [29,]     FALSE        FALSE
##  [30,]     FALSE        FALSE
##  [31,]     FALSE        FALSE
##  [32,]     FALSE        FALSE
##  [33,]     FALSE        FALSE
##  [34,]     FALSE        FALSE
##  [35,]     FALSE        FALSE
##  [36,]     FALSE        FALSE
##  [37,]     FALSE        FALSE
##  [38,]     FALSE        FALSE
##  [39,]     FALSE        FALSE
##  [40,]     FALSE        FALSE
##  [41,]     FALSE        FALSE
##  [42,]     FALSE        FALSE
##  [43,]     FALSE        FALSE
##  [44,]     FALSE        FALSE
##  [45,]     FALSE        FALSE
##  [46,]     FALSE        FALSE
##  [47,]     FALSE        FALSE
##  [48,]     FALSE        FALSE
##  [49,]     FALSE        FALSE
##  [50,]     FALSE        FALSE
##  [51,]     FALSE        FALSE
##  [52,]     FALSE        FALSE
##  [53,]     FALSE        FALSE
##  [54,]     FALSE        FALSE
##  [55,]     FALSE        FALSE
##  [56,]     FALSE        FALSE
##  [57,]     FALSE        FALSE
##  [58,]     FALSE        FALSE
##  [59,]     FALSE        FALSE
##  [60,]     FALSE        FALSE
##  [61,]     FALSE        FALSE
##  [62,]     FALSE        FALSE
##  [63,]     FALSE        FALSE
##  [64,]     FALSE        FALSE
##  [65,]     FALSE        FALSE
##  [66,]     FALSE        FALSE
##  [67,]     FALSE        FALSE
##  [68,]     FALSE        FALSE
##  [69,]     FALSE        FALSE
##  [70,]     FALSE        FALSE
##  [71,]     FALSE        FALSE
##  [72,]     FALSE        FALSE
##  [73,]     FALSE        FALSE
##  [74,]     FALSE        FALSE
##  [75,]     FALSE        FALSE
##  [76,]     FALSE        FALSE
##  [77,]     FALSE        FALSE
##  [78,]     FALSE        FALSE
##  [79,]     FALSE        FALSE
##  [80,]     FALSE        FALSE
##  [81,]     FALSE        FALSE
##  [82,]     FALSE        FALSE
##  [83,]     FALSE        FALSE
##  [84,]     FALSE        FALSE
##  [85,]     FALSE        FALSE
##  [86,]     FALSE        FALSE
##  [87,]     FALSE        FALSE
##  [88,]     FALSE        FALSE
##  [89,]     FALSE        FALSE
##  [90,]     FALSE        FALSE
##  [91,]     FALSE        FALSE
##  [92,]     FALSE        FALSE
##  [93,]     FALSE        FALSE
##  [94,]     FALSE        FALSE
##  [95,]     FALSE        FALSE
##  [96,]     FALSE        FALSE
##  [97,]     FALSE        FALSE
##  [98,]     FALSE        FALSE
##  [99,]     FALSE        FALSE
## [100,]     FALSE        FALSE
## [101,]     FALSE        FALSE
## [102,]     FALSE        FALSE
## [103,]     FALSE        FALSE
## [104,]     FALSE        FALSE
## [105,]     FALSE        FALSE
## [106,]     FALSE        FALSE
## [107,]     FALSE        FALSE
## [108,]     FALSE        FALSE
## [109,]     FALSE        FALSE
## [110,]     FALSE        FALSE
## [111,]     FALSE        FALSE
## [112,]     FALSE        FALSE
## [113,]     FALSE        FALSE
## [114,]     FALSE        FALSE
## [115,]     FALSE        FALSE
## [116,]     FALSE        FALSE
## [117,]     FALSE        FALSE
## [118,]     FALSE        FALSE
## [119,]     FALSE        FALSE
## [120,]     FALSE        FALSE
## [121,]     FALSE        FALSE
## [122,]     FALSE        FALSE
## [123,]     FALSE        FALSE
## [124,]     FALSE        FALSE
## [125,]     FALSE        FALSE
## [126,]     FALSE        FALSE
## [127,]     FALSE        FALSE
## [128,]     FALSE        FALSE
## [129,]     FALSE        FALSE
## [130,]     FALSE        FALSE
## [131,]     FALSE        FALSE
## [132,]     FALSE        FALSE
## [133,]     FALSE        FALSE
## [134,]     FALSE        FALSE
## [135,]     FALSE        FALSE
## [136,]     FALSE        FALSE
## [137,]     FALSE        FALSE
## [138,]     FALSE        FALSE
## [139,]     FALSE        FALSE
## [140,]     FALSE        FALSE
## [141,]     FALSE        FALSE
## [142,]     FALSE        FALSE
## [143,]     FALSE        FALSE
## [144,]     FALSE        FALSE
## [145,]     FALSE        FALSE
## [146,]     FALSE        FALSE
## [147,]     FALSE        FALSE
## [148,]     FALSE        FALSE
## [149,]     FALSE        FALSE
## [150,]     FALSE        FALSE
## [151,]     FALSE        FALSE
## [152,]     FALSE        FALSE
## [153,]     FALSE        FALSE
## [154,]     FALSE        FALSE
## [155,]     FALSE        FALSE
## [156,]     FALSE        FALSE
## [157,]     FALSE        FALSE
## [158,]     FALSE        FALSE
## [159,]     FALSE        FALSE
## [160,]     FALSE        FALSE
## [161,]     FALSE        FALSE
## [162,]     FALSE        FALSE
## [163,]     FALSE        FALSE
## [164,]     FALSE        FALSE
## [165,]     FALSE        FALSE
## [166,]     FALSE        FALSE
## [167,]     FALSE        FALSE
## [168,]     FALSE        FALSE
## [169,]     FALSE        FALSE
## [170,]     FALSE        FALSE
## [171,]     FALSE        FALSE
## [172,]     FALSE        FALSE
## [173,]     FALSE        FALSE
## [174,]     FALSE        FALSE
## [175,]     FALSE        FALSE
## [176,]     FALSE        FALSE
## [177,]     FALSE        FALSE
## [178,]     FALSE        FALSE
## [179,]     FALSE        FALSE
## [180,]     FALSE        FALSE
## [181,]     FALSE        FALSE
## [182,]     FALSE        FALSE
## [183,]     FALSE        FALSE
## [184,]     FALSE        FALSE
## [185,]     FALSE        FALSE
## [186,]     FALSE        FALSE
## [187,]     FALSE        FALSE
## [188,]     FALSE        FALSE
## [189,]     FALSE        FALSE
## [190,]     FALSE        FALSE
## [191,]     FALSE        FALSE
## [192,]     FALSE        FALSE
## [193,]     FALSE        FALSE
## [194,]     FALSE        FALSE
## [195,]     FALSE        FALSE
## [196,]     FALSE        FALSE
## [197,]     FALSE        FALSE
## [198,]     FALSE        FALSE
## [199,]     FALSE        FALSE
## [200,]     FALSE        FALSE
## [201,]     FALSE        FALSE
## [202,]     FALSE        FALSE
## [203,]     FALSE        FALSE
## [204,]     FALSE        FALSE
## [205,]     FALSE        FALSE
## [206,]     FALSE        FALSE
## [207,]     FALSE        FALSE
## [208,]     FALSE        FALSE
## [209,]     FALSE        FALSE
## [210,]     FALSE        FALSE
## [211,]     FALSE        FALSE
## [212,]     FALSE        FALSE
## [213,]     FALSE        FALSE
## [214,]     FALSE        FALSE
## [215,]     FALSE        FALSE
## [216,]     FALSE        FALSE
## [217,]     FALSE        FALSE
## [218,]     FALSE        FALSE
## [219,]     FALSE        FALSE
## [220,]     FALSE        FALSE
## [221,]     FALSE        FALSE
## [222,]     FALSE        FALSE
## [223,]     FALSE        FALSE
## [224,]     FALSE        FALSE
## [225,]     FALSE        FALSE
## [226,]     FALSE        FALSE
## [227,]     FALSE        FALSE
## [228,]     FALSE        FALSE
## [229,]     FALSE        FALSE
## [230,]     FALSE        FALSE
## [231,]     FALSE        FALSE
## [232,]     FALSE        FALSE
## [233,]     FALSE        FALSE
## [234,]     FALSE        FALSE
## [235,]     FALSE        FALSE
## [236,]     FALSE        FALSE
## [237,]     FALSE        FALSE
## [238,]     FALSE        FALSE
## [239,]     FALSE        FALSE
## [240,]     FALSE        FALSE
## [241,]     FALSE        FALSE
## [242,]     FALSE        FALSE
## [243,]     FALSE        FALSE
## [244,]     FALSE        FALSE
## [245,]     FALSE        FALSE
## [246,]     FALSE        FALSE
## [247,]     FALSE        FALSE
## [248,]     FALSE        FALSE
## [249,]     FALSE        FALSE
## [250,]     FALSE        FALSE
## [251,]     FALSE        FALSE
## [252,]     FALSE        FALSE
## [253,]     FALSE        FALSE
## [254,]     FALSE        FALSE
## [255,]     FALSE        FALSE
## [256,]     FALSE        FALSE
## [257,]     FALSE        FALSE
## [258,]     FALSE        FALSE
## [259,]     FALSE        FALSE
## [260,]     FALSE        FALSE
## [261,]     FALSE        FALSE
## [262,]     FALSE        FALSE
## [263,]     FALSE        FALSE
## [264,]     FALSE        FALSE
## [265,]     FALSE        FALSE
## [266,]     FALSE        FALSE
## [267,]     FALSE        FALSE
## [268,]     FALSE        FALSE
## [269,]     FALSE        FALSE
## [270,]     FALSE        FALSE
## [271,]     FALSE        FALSE
## [272,]     FALSE        FALSE
## [273,]     FALSE        FALSE
## [274,]     FALSE        FALSE
## [275,]     FALSE        FALSE
## [276,]     FALSE        FALSE

3.2.4 The any() function evaluates logical vectors

In the case of large data frames, as you can see there are just too many entries to identify. Sometimes we are just interested in knowing if at least one of our logical values matches to TRUE. That is accomplished using the any() function which can evaluate multiple vectors (or data.frames), answering which of those has at least one TRUE value.

We can use it to quickly ask if our infection_meta.tbl data frame has any NA values.

# Before we dig too deep, can we check if there are ANY NA values in our data.frame?
any(is.na(infection_meta.tbl)) # logical (TRUE or FALSE).
## [1] TRUE

Now we’ve confirmed that there is at least a single NA value in our data. Given that there are 276 rows with 29 columns (8004 total entries), we need to find a way to identify which rows contain NA values and conversely those without NA values. Let’s start with simple structures.

3.2.5 Find what you’re looking for with the which() function

Using is.na() we were returned a size-matched logical structure of whether or not a value was NA. There are some ways we can apply this information through different functions (as we saw with the any() function) but a useful method applicable to a vector of logicals is to ask which() indices (positions) return TRUE.

In our case, we use which() after checking for NA values in our object.

# Take a look at na_vector before you start manipulating it
na_vector 
## [1]  5  6 NA  7  7 NA
# wrap which() around our is.na() call
which(is.na(na_vector))
## [1] 3 6
# save the indices where NAs are present in na_vector
na_positions <- which(is.na(na_vector)) 

From above, we see that our NA values are located at indices 3 and 6!

3.2.6 Apply the results of which() to filter your data!

Now that we have the results from our which() call, we know exactly which indices have NA values. We can apply this directly to our original na_vector object to retrieve the non-NA values using the - (exclusion) syntax.

# cut out the na_values indices
removed_na_vector_1 <- na_vector[-na_positions]

# Check out the result
removed_na_vector_1
## [1] 5 6 7 7

3.2.6.1 Use the exclamation mark, !, to invert your logical vectors

Something we haven’t yet discussed in great detail is boolean logic. We’ll see more in later lectures but one very helpful symbol is ! which is also known as the logical NOT. In essence this will take in a logical value or group of logical values and switch them from TRUE to FALSE and vice versa.

As we mentioned in Lecture 01 you can index your data structures with a series of logicals TRUE for select, FALSE for exclude. We also know that is.na() produces a vector of logical values matching the indices of your input object. We can take this to the next level by combining the logical NOT with our is.na() results. This has the added bonus of avoiding the creation of an extra variable!

Let’s revisit this idea with our na_vector.

# Which values are NA?
is.na(na_vector)
## [1] FALSE FALSE  TRUE FALSE FALSE  TRUE
# Flip the logical result
!is.na(na_vector)
## [1]  TRUE  TRUE FALSE  TRUE  TRUE FALSE
# Apply this in our code for conditional indexing
# indexing using a size-matched logical vector
removed_na_vector_2 <- na_vector[!is.na(na_vector)] ; removed_na_vector_2
## [1] 5 6 7 7
# compare to using which() to index by position
na_vector[-which(is.na(na_vector))]
## [1] 5 6 7 7

Conditional indexing: That’s right! We just used conditional indexing in the above section to remove NA values from our na_vector. A data structure of booleans (TRUE and FALSE) can be used to select elements from within another data structure, as long as the relevant dimensions match! This becomes extremely relevant when we begin to filter our data frames based on specific criteria.


3.2.7 Where are NA values within our tibble or data.frame?

We’ve been using a lot of examples with simple and small data structures but the infection_meta.tbl as we saw in section 3.2.3 was much harder to view. That’s where the proper use of which() can come in quite handy. Let’s see how it works in direct usage.

# Which values in infection_meta.tbl are NA? Recall we have 276 rows of data!
which(is.na(infection_meta.tbl))
##   [1] 4480 4481 4482 4483 4484 4485 4486 4487 4488 4489 4490 4491 4492 4493 4494
##  [16] 4495 4496 4497 4498 4499 4500 4501 4502 4503 4504 4505 4506 4507 4508 4509
##  [31] 4510 4511 4512 4513 4514 4515 4516 4517 4518 4519 4520 4521 4522 4523 4524
##  [46] 4525 4526 4527 4528 4529 4530 4531 4532 4533 4534 4535 4536 4537 4538 4539
##  [61] 4540 4541 4542 4543 4544 4545 4546 4547 4548 4549 4550 4551 4552 4553 4554
##  [76] 4555 4556 4557 4558 4559 4560 4561 4562 4563 4564 4565 4566 4567 4568 4569
##  [91] 4570 4571 4572 4573 4574 4575 4576 4577 4578 4579 4580 4581 4582 4583 4584
## [106] 4585 4586 4587 4588 4589 4590 4591 4592 4593 4594 4595 4596 4597 4598 4599
## [121] 4600 4601 4602 4603 4604 4605 4606 4607 4608 4609 4610 4611 4612 4613 4614
## [136] 4615 4616 4617 4618 4619 4620 4621 4622 4623 4624 4625 4626 4627 4628 4629
## [151] 4630 4631 4632 4633 4634 4635 4636 4637 4638 4639 4640 4641 4642 4643 4644
## [166] 4645 4646 4647 4648 4649 4650 4651 4652 4653 4654 4655 4656 4657 4658 4659
## [181] 4660 4661 4662 4663 4664 4665 4666 4667 4668 4669 4670 4671 4672 4673 4674
## [196] 4675 4676 4677 4678 4679 4680 4681 4682 4683 4684 4685 4686 4687 4688 4689
## [211] 4690 4691 4692 4756 4757 4758 4759 4760 4761 4762 4763 4764 4765 4766 4767
## [226] 4768 4769 4770 4771 4772 4773 4774 4775 4776 4777 4778 4779 4780 4781 4782
## [241] 4783 4784 4785 4786 4787 4788 4789 4790 4791 4792 4793 4794 4795 4796 4797
## [256] 4798 4799 4800 4801 4802 4803 6109 6110 6111 6118 6119 6120 6136 6137 6138
## [271] 6139 6140 6141 6142 6143 6144 6145 6146 6147 6148 6149 6150 6151 6152 6153
## [286] 6154 6155 6156

What do those values even mean? We are essentially seeing the positions of each NA element in infection_meta.tbl where the indices are assigned from top to bottom and then left to right. Thus 1-276 are values from column 1, 277-552 belong to column 2, etc.

3.2.7.1 Use complete.cases() to query larger objects

We have verified in many ways that we have at least one NA value in counts. Often we may wish to drop incomplete observations where one or more variables is lacking data. Using the which() function would be helpful but, as we can see from our above example, it only returns the element order for the entire data.frame. Instead, we want to look for rows that have any NA values. If you were only concerned with NA values in a specific column of your dataframe, which() would be a good way to accomplish your task.

In the case of removing any incomplete rows, the function complete.cases() looks by row to see whether any row contains an NA and returns a logical vector with each entry representing a row within the dataframe. You can then subset out the rows containing any NAs using conditional indexing.

# ?complete.cases

# Outputs a logical vector specifying which observations/rows have no missing values across the entire sequence.
head(complete.cases(infection_meta.tbl), 20)
##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE TRUE TRUE TRUE
# Use it wisely to keep complete rows. Pop quiz [x,y] will we index by x or by y?
str(infection_meta.tbl[complete.cases(infection_meta.tbl),])
## tibble [57 x 29] (S3: tbl_df/tbl/data.frame)
##  $ experiment       : chr [1:57] "190426_VC20019_LUAm1_0M_72hpi" "190426_VC20019_LUAm1_10M_72hpi" "190426_VC20019_LUAm1_20M_72hpi" "190426_N2_LUAm1_0M_72hpi" ...
##  $ experimenter     : chr [1:57] "CM" "CM" "CM" "CM" ...
##  $ description      : chr [1:57] "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" ...
##  $ Infection Date   : num [1:57] 190423 190423 190423 190423 190423 ...
##  $ Plate Number     : num [1:57] 1 2 3 4 5 6 7 8 9 10 ...
##  $ Worm_strain      : chr [1:57] "VC20019" "VC20019" "VC20019" "N2" ...
##  $ Total Worms      : num [1:57] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
##  $ Spore Strain     : chr [1:57] "LUAm1" "LUAm1" "LUAm1" "LUAm1" ...
##  $ Spore Lot        : chr [1:57] "2A" "2A" "2A" "2A" ...
##  $ Lot concentration: num [1:57] 176000 176000 176000 176000 176000 176000 176000 176000 176000 176000 ...
##  $ Total Spores (M) : num [1:57] 0 10 20 0 10 20 0 10 20 0 ...
##  $ Total ul spore   : num [1:57] 0 56.8 113.6 0 56.8 ...
##  $ Infection Round  : num [1:57] 1 1 1 1 1 1 1 1 1 1 ...
##  $ 40X OP50 (mL)    : num [1:57] 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 ...
##  $ Plate Size       : num [1:57] 6 6 6 6 6 6 6 6 6 6 ...
##  $ Spores(M)/cm2    : num [1:57] 0 0.354 0.708 0 0.354 ...
##  $ Time plated      : num [1:57] 1300 1300 1300 1300 1300 1300 1300 1300 1300 1300 ...
##  $ Time Incubated   : num [1:57] 1600 1600 1600 1600 1600 1600 1600 1600 1600 1600 ...
##  $ Temp             : num [1:57] 21 21 21 21 21 21 21 21 21 21 ...
##  $ timepoint        : chr [1:57] "72" "72" "72" "72" ...
##  $ infection.type   : chr [1:57] "continuous" "continuous" "continuous" "continuous" ...
##  $ Fixing Date      : num [1:57] 190426 190426 190426 190426 190426 ...
##  $ Location         : chr [1:57] "Sample exhausted" "Sample exhausted" "Sample exhausted" "Sample exhausted" ...
##  $ Staining Date    : num [1:57] 190513 190513 190513 190430 190513 ...
##  $ Stain type       : chr [1:57] "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "DY96" ...
##  $ Slide date       : num [1:57] 190515 190515 190515 190501 190515 ...
##  $ Slide number     : num [1:57] 1 2 3 4 5 6 7 8 9 10 ...
##  $ Slide Box        : num [1:57] 2 2 2 2 2 2 2 2 2 2 ...
##  $ Imaging Date     : num [1:57] 190516 190516 190516 190502 190516 ...

Use the any() function to identify if any values (ie at least one) in a logical vector or expression evaluates to true! This function also returns a single logical value. This can be a very handy tool when you’re concerned more with completeness rather than individual values.

3.2.7.2 Combine which() with apply() to find where data might be missing

Hold up! We just removed our incomplete cases and went from 276 observations to a measly 57! Before we lose over 200 rows of our data, maybe we can take a quick look at where our NA values are located. Sometimes it could exist in just a small number of columns that don’t really have much importance.

Now that we have a few tools under our belt, let’s figure out which() columns have any() values which are NA in our dataset. To do this, we’ll rope in the apply() function to help us loop through each column individually as well.

# Use the apply function to find columns with NA and then determine which columns return TRUE
which(apply(infection_meta.tbl, 
            MARGIN = 2,                  # Use the columns
            function(x) any(is.na(x))    # Here's our function to examine each column for NA values 
            ) # end apply
     ) # end which 
##    Time plated Time Incubated       Location 
##             17             18             23

We can see from the results of our code, that we really just have NA values in 3 metadata columns: Time plated, Time Incubated, and Location. What do you think the integer values in the resulting vector represent?

Use the combined anyNA() function to shortcut the use of the two functions any() and is.na(). You can use the anyNA() function to ask the same question as two! You can play with the code above to replace the function used in apply() with the anyNA().


3.2.8 Consider just replacing the NAs with something useful

Depending on your data or situation, you may want to include rows (observations) even though some aspects may be incomplete. Instead, consider replacing NAs in your data set. This could be replacement with a sample average, or the mode of the data, or a value that is below a threshold.

# Replace the NA values in our table under the "Location" column.
# Note that this will permanently change our tibble!
infection_meta.tbl[is.na(infection_meta.tbl$Location), ]$Location <- "None"

# Check which columns have NA values now
which(apply(infection_meta.tbl, MARGIN = 2, 
            function(x) anyNA(x))) # Notice our use of the anyNA() function this time?
##    Time plated Time Incubated 
##             17             18

More about NA values: To learn about a few more functions that you can use to identify and remove NA values from your data structure, check out the Appendix at the end of this lecture.


Comprehension Question 3.0.0: Replace the NA values in Time plated and Time Incubated with the values 1300 and 1600 respectively. You can do this by completing the skeleton code below where we’ll make a copy of the tibble to work on called “comprehension_meta.tbl”. Check for NA values afterwards!

# comprehension answer code 3.0.0

# Copy our table over to new version
comprehension_meta.tbl <- infection_meta.tbl

# Fix the Time plated column
comprehension_meta.tbl[...), ]$... <- 1300

# Fix the Time Incubated column
comprehension_meta.tbl[...), ]$... <- 1600

# Will we have any NA values left?
any(is.na(comprehension_meta.tbl))
## Error: <text>:7:27: unexpected ')'
## 6: # Fix the Time plated column
## 7: comprehension_meta.tbl[...)
##                              ^

4.0.0 A quick introduction to the dplyr (DEE ply er) package

Now that we’ve inspected our data for various pitfalls, we can move on to filtering and sorting. Before we answer any questions with our data, we need the ability to select and filter parts of our data. This can be accomplished with base functions in R, but the dplyr package provides a more human-readable syntax.

Image courtesy of xkcd at https://xkcd.com/1906/

The dplyr package was made by Hadley Wickham to help make data frame manipulation easier. There are 5 major types of functions that we are concerned with in today’s lecture:

  1. filter() - subsets your data.frame by row
  2. select() - subsets your data.frame by columns
  3. arrange() - orders your data.frame alphabetically or numerically by ascending or descending variables
  4. mutate(), transmute() - create a new column of data
  5. summarize() or summarise() - reduces data to summary values (for example using mean(), sd(), min(), quantile(), etc)

There’s always more to explore (in dplyr)! Although we are focused on just a handful of dplyr functions, we will end up exploring some more as time goes by. The tidyverse packages actually have a very comprehensive set of web pages full of descriptions and examples for most of the functions in each tidyverse package. You can find the dplyr function page here.


4.1.0 Use conditionals to specify subsets of your data based on criteria

It is often extremely useful to subset your data by some logical condition. We’ve seen some examples above where we used functions and code to identify and keep specific rows using conditional indexing. Let’s dig deeper into that topic.

Conditionals ask a question about one or more values and return a logical (TRUE or FALSE) result. Here’s a quick table breaking down the uses of basic conditional statements.

Logical operator Meaning Example Result
== value equivalence (ie equal to) “this” == “that” FALSE
!= not equal to 4 != 5 TRUE
> greater than 4 > 5 FALSE
>= greater than or equal to 4 >= 5 FALSE
< less than 4 < 5 TRUE
<= less than or equal to 4 <= 5 TRUE

Cautionary Note: == may also return TRUE for NA values in your comparison

Mastering the meaning and use of these logical operators will go a long way to helping you in your data science journey!


4.1.1 Use the match operator, %in%, to compare sets

Sometimes the simplest kind of conditional can be thought of as comparing two sets of data. Which values in set A exist in set B? As an example from our current dataset, we may want to keep all rows that have either N2 OR JU1400 in the Worm_strain column.

To accomplish this using basic functions in R, we turn to the match binary operator, %in%, which can ask for us “does x contain any elements present in y” using the syntax x %in% y. This operator usually returns a logical vector matching the size of x with TRUE values if the element from x is in y. Note that the size of x and y need not be identical!

Let’s see what that looks like in the context of our above question.

# Find out more about the match operator by using double quotes
# ?"%in%"

# What does %in% return?
str(infection_meta.tbl$Worm_strain %in% c("N2", "JU1400"))
##  logi [1:276] FALSE FALSE FALSE TRUE TRUE TRUE ...
# You can filter your data using basic R commands
# Use the conditional result to index our data.frame
head(infection_meta.tbl[infection_meta.tbl$Worm_strain %in% c("N2", "JU1400"),])
## # A tibble: 6 x 29
##   experiment            experimenter description `Infection Date` `Plate Number`
##   <chr>                 <chr>        <chr>                  <dbl>          <dbl>
## 1 190426_N2_LUAm1_0M_7~ CM           Wild isola~           190423              4
## 2 190426_N2_LUAm1_10M_~ CM           Wild isola~           190423              5
## 3 190426_N2_LUAm1_20M_~ CM           Wild isola~           190423              6
## 4 190426_JU1400_LUAm1_~ CM           Wild isola~           190423             25
## 5 190426_JU1400_LUAm1_~ CM           Wild isola~           190423             26
## 6 190426_JU1400_LUAm1_~ CM           Wild isola~           190423             27
## # i 24 more variables: Worm_strain <chr>, `Total Worms` <dbl>,
## #   `Spore Strain` <chr>, `Spore Lot` <chr>, `Lot concentration` <dbl>,
## #   `Total Spores (M)` <dbl>, `Total ul spore` <dbl>, `Infection Round` <dbl>,
## #   `40X OP50 (mL)` <dbl>, `Plate Size` <dbl>, `Spores(M)/cm2` <dbl>,
## #   `Time plated` <dbl>, `Time Incubated` <dbl>, Temp <dbl>, timepoint <chr>,
## #   infection.type <chr>, `Fixing Date` <dbl>, Location <chr>,
## #   `Staining Date` <dbl>, `Stain type` <chr>, `Slide date` <dbl>, ...
# how many rows (entries) do we find with our query?
nrow(infection_meta.tbl[infection_meta.tbl$Worm_strain %in% c("N2", "JU1400"),])
## [1] 151
# A near-equivalent command using the logical OR
# This, however, is a cautionary example about filtering your data. Watch out for this command!

nrow(infection_meta.tbl[(infection_meta.tbl$Worm_strain == "N2" | infection_meta.tbl$Worm_strain == "JU1400" ),])
## [1] 151
# The above command will also return any entries with NA in your filtered criteria.
# Remember where we saw the `Time plated` column in our previous coding cells?
nrow(infection_meta.tbl[(infection_meta.tbl$`Time plated` == 1300),])
## [1] 276

From our above output, we see that our first filtering step did return 151 rows of data as expected. However, we know that our Time plated column definitely has some NA values but we are returned all 276 rows, indicating that the NA values have been kept!

4.2.0 Use the filter() function to replicate %in% and more!

From our query above we already know we were asking R to search through our data frame under the Worm_strain column for any matches to N2 OR JU1400. The notation, however, can be a little confusing whereas the filter() function can accomplish the same task in a more human-readable syntax.

Using the filter() function we can evaluate each row with our criteria. Our first argument will be our data.frame, followed by the information for the rows we want to subset by. Parameters we are interested in are:

  • .data: A data frame or data frame extension (e.g. a tibble)
  • ...: Expressions that can return a logical value based on the variables within .data

Notably, filter() drops any NA rows/values that might result from our comparisons. Why is that important?

How do we go about constructing expressions for this function? Let’s give it a try!

# But the syntax using filter is much more human readable
filter(infection_meta.tbl, 
       Worm_strain == "N2" & Worm_strain == "JU1400")
## # A tibble: 0 x 29
## # i 29 variables: experiment <chr>, experimenter <chr>, description <chr>,
## #   Infection Date <dbl>, Plate Number <dbl>, Worm_strain <chr>,
## #   Total Worms <dbl>, Spore Strain <chr>, Spore Lot <chr>,
## #   Lot concentration <dbl>, Total Spores (M) <dbl>, Total ul spore <dbl>,
## #   Infection Round <dbl>, 40X OP50 (mL) <dbl>, Plate Size <dbl>,
## #   Spores(M)/cm2 <dbl>, Time plated <dbl>, Time Incubated <dbl>, Temp <dbl>,
## #   timepoint <chr>, infection.type <chr>, Fixing Date <dbl>, ...

4.2.1 Slicing and filtering your data requires the proper use of logical operators

Uh oh! Our code produced an empty tibble because we used the logical operator & (AND). For us it makes sense to want only N2 AND JU1400, but to R it won’t make sense because a worm strain can’t be both N2 AND JU1400 at the same time. That’s why we need to use the | (OR) operator to select everything that is N2 OR JU1400. Here’s a handy summary about the remaining logical operators.

Operator Description Use or Result
! Logical NOT Converts logical results into their opposite
& Element-wise logical AND Perform element-wise AND; the result length matches that of the longer operand
&& Logical AND Examines only the first element of the operands resulting in a single length logical vector **
| Element-wise logical OR Perform element-wise OR; the result length matches that of the longer operand
|| Logical OR Examines only the first element of the operands resulting into a single length logical vector **

** As of R 4.3.0, this will only compare single-length logical values

Logical operators: To summarize for “&” it will return TRUE if all elements in that single comparison are TRUE while “|” will return TRUE if any elements in that single comparison are TRUE. This logic is applied between index-matched elements and can be combined into more complex statements!

Logical statement Evaluation
TRUE & TRUE TRUE
TRUE & FALSE, FALSE & TRUE, FALSE & FALSE FALSE
TRUE & TRUE & FALSE FALSE
TRUE | TRUE, TRUE | FALSE, FALSE | TRUE TRUE
FALSE | FALSE FALSE
TRUE TRUE

Now, let’s try that filter() command again.

# Filter infection_meta.tbl using the proper logical operator
nrow(filter(infection_meta.tbl, 
            Worm_strain == "N2" | Worm_strain == "JU1400"))
## [1] 151
#Will this work?
nrow(filter(infection_meta.tbl, Worm_strain == c("N2", "JU1400")))
## [1] 88

4.2.2 A reminder/warning about vectors and recycling

What happened with our above command? Why did it return only 88 rows? To be honest, it was lucky that the operation worked at all! When R encounters operations between vectors of different size, it will recycle the shorter of the vectors when it can. We briefly discussed this idea in lecture 01 section 3.2.4.2 where we saw vector recycling happening with our matrix creation.

Here’s an example

c(1,2,3) + c(10,11)
## Warning in c(1, 2, 3) + c(10, 11): longer object length is not a multiple of
## shorter object length
## [1] 11 13 13

In this case, R gave us a warning that our vectors don’t match in length. It returned to us a vector of length 3 (our longest vector), and it recycled the 10 from the shorter vector to add to the 3. See the below table for clarification.

first value second value result
1 10 11
2 11 13
3 10 13

R will assume that you know what you are doing as long as one of your vector lengths is a multiple of your other vector length. Here the shorter vector is recycled twice. No warning is given.

# What happens if we increase the length of our first vector?
c(1,2,3,4) + c(10,11)
## [1] 11 13 13 15

4.2.3 When filtering against a long list of criteria, use the match operator, %in%, instead of ==

Going back to our broken code:

nrow(filter(infection_meta.tbl, Worm_strain == c("N2", "JU1400")))

while well intentioned was basically saying “filter for odd rows where Worm_strain == "N2" AND even rows where Worm_strain == "JU1400".

Recall that %in% is a binary match operator that says “for each element in Worm_strain, does that element exist in the vector c("N2", "JU1400")?”

# Use the correct operator to get the job done when filtering with vectors
nrow(filter(infection_meta.tbl, 
            Worm_strain %in% c("N2", "JU1400")))
## [1] 151

4.2.4 Use filter() to identify matching candidates with criteria across multiple variables

We just filtered for multiple worm strains (multiple rows based on the identity of values in a single column). However, you can also filter for rows based on values in multiple columns. We can do this from basic principles too but this is where the filter() function really shines as it keeps the query language quite clear for us and others to read and interpret.

Operator precedence: Before we jump in there, we should quickly note that there is an order or precedence for groups of logical operators. The more “mathematical” operators will be evaluated before logical operators that compare by combining logical values (ie & and |). You can use parentheses () to separate or control the order of lower precedence operations. Find out more in the R manual.

For example, you can use the following filtering combinations:

# Query for samples of either Worm strain N2 OR Spore Strain ERTm5 

head(filter(infection_meta.tbl, 
            Worm_strain == "N2" | `Spore Strain` == "ERTm5"))
## # A tibble: 6 x 29
##   experiment            experimenter description `Infection Date` `Plate Number`
##   <chr>                 <chr>        <chr>                  <dbl>          <dbl>
## 1 190426_N2_LUAm1_0M_7~ CM           Wild isola~           190423              4
## 2 190426_N2_LUAm1_10M_~ CM           Wild isola~           190423              5
## 3 190426_N2_LUAm1_20M_~ CM           Wild isola~           190423              6
## 4 190426_VC20019_ERTm5~ CM           Wild isola~           190423             34
## 5 190426_VC20019_ERTm5~ CM           Wild isola~           190423             35
## 6 190426_VC20019_ERTm5~ CM           Wild isola~           190423             36
## # i 24 more variables: Worm_strain <chr>, `Total Worms` <dbl>,
## #   `Spore Strain` <chr>, `Spore Lot` <chr>, `Lot concentration` <dbl>,
## #   `Total Spores (M)` <dbl>, `Total ul spore` <dbl>, `Infection Round` <dbl>,
## #   `40X OP50 (mL)` <dbl>, `Plate Size` <dbl>, `Spores(M)/cm2` <dbl>,
## #   `Time plated` <dbl>, `Time Incubated` <dbl>, Temp <dbl>, timepoint <chr>,
## #   infection.type <chr>, `Fixing Date` <dbl>, Location <chr>,
## #   `Staining Date` <dbl>, `Stain type` <chr>, `Slide date` <dbl>, ...
# == means "is exactly". 
# Query for rows with Plate Size = 6 and Spores(M)/cm2 = 0
str(filter(infection_meta.tbl,
           `Plate Size` == 6 & `Spores(M)/cm2` == 0), 
    give.attr = FALSE)
## spc_tbl_ [77 x 29] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ experiment       : chr [1:77] "190426_VC20019_LUAm1_0M_72hpi" "190426_N2_LUAm1_0M_72hpi" "190426_AB1_LUAm1_0M_72hpi" "190426_JU397_LUAm1_0M_72hpi" ...
##  $ experimenter     : chr [1:77] "CM" "CM" "CM" "CM" ...
##  $ description      : chr [1:77] "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" ...
##  $ Infection Date   : num [1:77] 190423 190423 190423 190423 190423 ...
##  $ Plate Number     : num [1:77] 1 4 7 10 13 16 19 22 25 28 ...
##  $ Worm_strain      : chr [1:77] "VC20019" "N2" "AB1" "JU397" ...
##  $ Total Worms      : num [1:77] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
##  $ Spore Strain     : chr [1:77] "LUAm1" "LUAm1" "LUAm1" "LUAm1" ...
##  $ Spore Lot        : chr [1:77] "2A" "2A" "2A" "2A" ...
##  $ Lot concentration: num [1:77] 176000 176000 176000 176000 176000 176000 176000 176000 176000 176000 ...
##  $ Total Spores (M) : num [1:77] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Total ul spore   : num [1:77] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Infection Round  : num [1:77] 1 1 1 1 1 1 1 1 1 1 ...
##  $ 40X OP50 (mL)    : num [1:77] 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 ...
##  $ Plate Size       : num [1:77] 6 6 6 6 6 6 6 6 6 6 ...
##  $ Spores(M)/cm2    : num [1:77] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Time plated      : num [1:77] 1300 1300 1300 1300 1300 1300 1300 1300 1300 1300 ...
##  $ Time Incubated   : num [1:77] 1600 1600 1600 1600 1600 1600 1600 1600 1600 1600 ...
##  $ Temp             : num [1:77] 21 21 21 21 21 21 21 21 21 21 ...
##  $ timepoint        : chr [1:77] "72" "72" "72" "72" ...
##  $ infection.type   : chr [1:77] "continuous" "continuous" "continuous" "continuous" ...
##  $ Fixing Date      : num [1:77] 190426 190426 190426 190426 190426 ...
##  $ Location         : chr [1:77] "Sample exhausted" "Sample exhausted" "Sample exhausted" "Sample exhausted" ...
##  $ Staining Date    : num [1:77] 190513 190430 190430 190430 190430 ...
##  $ Stain type       : chr [1:77] "Sp.9 FISH + DY96" "DY96" "DY96" "DY96" ...
##  $ Slide date       : num [1:77] 190515 190501 190501 190501 190501 ...
##  $ Slide number     : num [1:77] 1 4 7 10 13 16 19 22 25 28 ...
##  $ Slide Box        : num [1:77] 2 2 2 2 2 2 2 2 2 2 ...
##  $ Imaging Date     : num [1:77] 190516 190502 190502 190502 190502 ...
# equivalently, the ',' represents an implicit &
str(filter(infection_meta.tbl,
           `Plate Size` == 6,
           `Spores(M)/cm2` == 0),
    give.attr = FALSE)
## spc_tbl_ [77 x 29] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ experiment       : chr [1:77] "190426_VC20019_LUAm1_0M_72hpi" "190426_N2_LUAm1_0M_72hpi" "190426_AB1_LUAm1_0M_72hpi" "190426_JU397_LUAm1_0M_72hpi" ...
##  $ experimenter     : chr [1:77] "CM" "CM" "CM" "CM" ...
##  $ description      : chr [1:77] "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" ...
##  $ Infection Date   : num [1:77] 190423 190423 190423 190423 190423 ...
##  $ Plate Number     : num [1:77] 1 4 7 10 13 16 19 22 25 28 ...
##  $ Worm_strain      : chr [1:77] "VC20019" "N2" "AB1" "JU397" ...
##  $ Total Worms      : num [1:77] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
##  $ Spore Strain     : chr [1:77] "LUAm1" "LUAm1" "LUAm1" "LUAm1" ...
##  $ Spore Lot        : chr [1:77] "2A" "2A" "2A" "2A" ...
##  $ Lot concentration: num [1:77] 176000 176000 176000 176000 176000 176000 176000 176000 176000 176000 ...
##  $ Total Spores (M) : num [1:77] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Total ul spore   : num [1:77] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Infection Round  : num [1:77] 1 1 1 1 1 1 1 1 1 1 ...
##  $ 40X OP50 (mL)    : num [1:77] 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 ...
##  $ Plate Size       : num [1:77] 6 6 6 6 6 6 6 6 6 6 ...
##  $ Spores(M)/cm2    : num [1:77] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Time plated      : num [1:77] 1300 1300 1300 1300 1300 1300 1300 1300 1300 1300 ...
##  $ Time Incubated   : num [1:77] 1600 1600 1600 1600 1600 1600 1600 1600 1600 1600 ...
##  $ Temp             : num [1:77] 21 21 21 21 21 21 21 21 21 21 ...
##  $ timepoint        : chr [1:77] "72" "72" "72" "72" ...
##  $ infection.type   : chr [1:77] "continuous" "continuous" "continuous" "continuous" ...
##  $ Fixing Date      : num [1:77] 190426 190426 190426 190426 190426 ...
##  $ Location         : chr [1:77] "Sample exhausted" "Sample exhausted" "Sample exhausted" "Sample exhausted" ...
##  $ Staining Date    : num [1:77] 190513 190430 190430 190430 190430 ...
##  $ Stain type       : chr [1:77] "Sp.9 FISH + DY96" "DY96" "DY96" "DY96" ...
##  $ Slide date       : num [1:77] 190515 190501 190501 190501 190501 ...
##  $ Slide number     : num [1:77] 1 4 7 10 13 16 19 22 25 28 ...
##  $ Slide Box        : num [1:77] 2 2 2 2 2 2 2 2 2 2 ...
##  $ Imaging Date     : num [1:77] 190516 190502 190502 190502 190502 ...
# != means "is not"
# Query for experiments on plate size not equal to 6 and Spore density not equal to 0
str(filter(infection_meta.tbl,
           `Plate Size` != 6,
           `Spores(M)/cm2` != 0),
    give.attr = FALSE)
## spc_tbl_ [18 x 29] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ experiment       : chr [1:18] "200821_N2_LUAm1_30M_72hpi" "200821_JU1400_LUAm1_30M_72hpi" "200821_N2_ERTm5_10M_72hpi" "200821_JU1400_ERTm5_10M_72hpi" ...
##  $ experimenter     : chr [1:18] "CM" "CM" "CM" "CM" ...
##  $ description      : chr [1:18] "RNAseq data rep 1" "RNAseq data rep 1" "RNAseq data rep 1" "RNAseq data rep 1" ...
##  $ Infection Date   : num [1:18] 200818 200818 200818 200818 200818 ...
##  $ Plate Number     : num [1:18] 13 14 15 16 17 18 27 28 29 30 ...
##  $ Worm_strain      : chr [1:18] "N2" "JU1400" "N2" "JU1400" ...
##  $ Total Worms      : num [1:18] 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 ...
##  $ Spore Strain     : chr [1:18] "LUAm1" "LUAm1" "ERTm5" "ERTm5" ...
##  $ Spore Lot        : chr [1:18] "2A" "2A" "2" "2" ...
##  $ Lot concentration: num [1:18] 176000 176000 427000 427000 370000 370000 176000 176000 176000 176000 ...
##  $ Total Spores (M) : num [1:18] 30 30 10 10 12 12 10 10 10 10 ...
##  $ Total ul spore   : num [1:18] 170.5 170.5 23.4 23.4 32.4 ...
##  $ Infection Round  : num [1:18] 1 1 1 1 1 1 1 1 1 1 ...
##  $ 40X OP50 (mL)    : num [1:18] 1 1 1 1 1 1 0.25 0.25 0.25 0.25 ...
##  $ Plate Size       : num [1:18] 10 10 10 10 10 10 10 10 10 10 ...
##  $ Spores(M)/cm2    : num [1:18] 0.382 0.382 0.127 0.127 0.153 ...
##  $ Time plated      : num [1:18] NA NA NA NA NA NA NA NA NA NA ...
##  $ Time Incubated   : num [1:18] 48 48 48 48 48 48 72 72 72 72 ...
##  $ Temp             : num [1:18] 21 21 21 21 21 21 21 21 21 21 ...
##  $ timepoint        : chr [1:18] "72" "72" "72" "72" ...
##  $ infection.type   : chr [1:18] "continuous" "continuous" "continuous" "continuous" ...
##  $ Fixing Date      : num [1:18] 200821 200821 200821 200821 200821 ...
##  $ Location         : chr [1:18] "Sample exhausted" "Sample exhausted" "Sample exhausted" "Sample exhausted" ...
##  $ Staining Date    : num [1:18] 200831 200831 200831 200831 200831 ...
##  $ Stain type       : chr [1:18] "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "microB FISH + DY96" "microB FISH + DY96" ...
##  $ Slide date       : num [1:18] 200903 200903 200903 200903 200903 ...
##  $ Slide number     : num [1:18] 13 14 15 16 17 18 27 28 29 30 ...
##  $ Slide Box        : num [1:18] 4 4 4 4 4 4 4 4 4 4 ...
##  $ Imaging Date     : num [1:18] 200912 200912 200912 200912 200912 ...
# >= means "greater than or equal to"
# Query for experiments completed on plate size < 10cm and with more than or equal to 0.2 Spores(M)/cm2
str(filter(infection_meta.tbl,
           `Plate Size` < 10,
           `Spores(M)/cm2` >= 0.2),
    give.attr = FALSE)
## spc_tbl_ [79 x 29] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ experiment       : chr [1:79] "190426_VC20019_LUAm1_10M_72hpi" "190426_VC20019_LUAm1_20M_72hpi" "190426_N2_LUAm1_10M_72hpi" "190426_N2_LUAm1_20M_72hpi" ...
##  $ experimenter     : chr [1:79] "CM" "CM" "CM" "CM" ...
##  $ description      : chr [1:79] "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" ...
##  $ Infection Date   : num [1:79] 190423 190423 190423 190423 190423 ...
##  $ Plate Number     : num [1:79] 2 3 5 6 8 9 11 12 14 15 ...
##  $ Worm_strain      : chr [1:79] "VC20019" "VC20019" "N2" "N2" ...
##  $ Total Worms      : num [1:79] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
##  $ Spore Strain     : chr [1:79] "LUAm1" "LUAm1" "LUAm1" "LUAm1" ...
##  $ Spore Lot        : chr [1:79] "2A" "2A" "2A" "2A" ...
##  $ Lot concentration: num [1:79] 176000 176000 176000 176000 176000 176000 176000 176000 176000 176000 ...
##  $ Total Spores (M) : num [1:79] 10 20 10 20 10 20 10 20 10 20 ...
##  $ Total ul spore   : num [1:79] 56.8 113.6 56.8 113.6 56.8 ...
##  $ Infection Round  : num [1:79] 1 1 1 1 1 1 1 1 1 1 ...
##  $ 40X OP50 (mL)    : num [1:79] 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 ...
##  $ Plate Size       : num [1:79] 6 6 6 6 6 6 6 6 6 6 ...
##  $ Spores(M)/cm2    : num [1:79] 0.354 0.708 0.354 0.708 0.354 ...
##  $ Time plated      : num [1:79] 1300 1300 1300 1300 1300 1300 1300 1300 1300 1300 ...
##  $ Time Incubated   : num [1:79] 1600 1600 1600 1600 1600 1600 1600 1600 1600 1600 ...
##  $ Temp             : num [1:79] 21 21 21 21 21 21 21 21 21 21 ...
##  $ timepoint        : chr [1:79] "72" "72" "72" "72" ...
##  $ infection.type   : chr [1:79] "continuous" "continuous" "continuous" "continuous" ...
##  $ Fixing Date      : num [1:79] 190426 190426 190426 190426 190426 ...
##  $ Location         : chr [1:79] "Sample exhausted" "Sample exhausted" "Sample exhausted" "Sample exhausted" ...
##  $ Staining Date    : num [1:79] 190513 190513 190513 190513 190513 ...
##  $ Stain type       : chr [1:79] "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" ...
##  $ Slide date       : num [1:79] 190515 190515 190515 190515 190515 ...
##  $ Slide number     : num [1:79] 2 3 5 6 8 9 11 12 14 15 ...
##  $ Slide Box        : num [1:79] 2 2 2 2 2 2 2 2 2 2 ...
##  $ Imaging Date     : num [1:79] 190516 190516 190516 190516 190516 ...
# <= means "lesser than or equal too"
# Query for experiments where the Spores(M)/cm2 ratio is above 0 and <= 0.5
str(filter(infection_meta.tbl,
           `Spores(M)/cm2` > 0,
           `Spores(M)/cm2` <= 0.5), 
    give.attr = FALSE)
## spc_tbl_ [177 x 29] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ experiment       : chr [1:177] "190426_VC20019_LUAm1_10M_72hpi" "190426_N2_LUAm1_10M_72hpi" "190426_AB1_LUAm1_10M_72hpi" "190426_JU397_LUAm1_10M_72hpi" ...
##  $ experimenter     : chr [1:177] "CM" "CM" "CM" "CM" ...
##  $ description      : chr [1:177] "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" ...
##  $ Infection Date   : num [1:177] 190423 190423 190423 190423 190423 ...
##  $ Plate Number     : num [1:177] 2 5 8 11 14 17 20 23 26 29 ...
##  $ Worm_strain      : chr [1:177] "VC20019" "N2" "AB1" "JU397" ...
##  $ Total Worms      : num [1:177] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
##  $ Spore Strain     : chr [1:177] "LUAm1" "LUAm1" "LUAm1" "LUAm1" ...
##  $ Spore Lot        : chr [1:177] "2A" "2A" "2A" "2A" ...
##  $ Lot concentration: num [1:177] 176000 176000 176000 176000 176000 176000 176000 176000 176000 176000 ...
##  $ Total Spores (M) : num [1:177] 10 10 10 10 10 10 10 10 10 10 ...
##  $ Total ul spore   : num [1:177] 56.8 56.8 56.8 56.8 56.8 ...
##  $ Infection Round  : num [1:177] 1 1 1 1 1 1 1 1 1 1 ...
##  $ 40X OP50 (mL)    : num [1:177] 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 ...
##  $ Plate Size       : num [1:177] 6 6 6 6 6 6 6 6 6 6 ...
##  $ Spores(M)/cm2    : num [1:177] 0.354 0.354 0.354 0.354 0.354 ...
##  $ Time plated      : num [1:177] 1300 1300 1300 1300 1300 1300 1300 1300 1300 1300 ...
##  $ Time Incubated   : num [1:177] 1600 1600 1600 1600 1600 1600 1600 1600 1600 1600 ...
##  $ Temp             : num [1:177] 21 21 21 21 21 21 21 21 21 21 ...
##  $ timepoint        : chr [1:177] "72" "72" "72" "72" ...
##  $ infection.type   : chr [1:177] "continuous" "continuous" "continuous" "continuous" ...
##  $ Fixing Date      : num [1:177] 190426 190426 190426 190426 190426 ...
##  $ Location         : chr [1:177] "Sample exhausted" "Sample exhausted" "Sample exhausted" "Sample exhausted" ...
##  $ Staining Date    : num [1:177] 190513 190513 190513 190513 190513 ...
##  $ Stain type       : chr [1:177] "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" ...
##  $ Slide date       : num [1:177] 190515 190515 190515 190515 190515 ...
##  $ Slide number     : num [1:177] 2 5 8 11 14 17 20 23 26 29 ...
##  $ Slide Box        : num [1:177] 2 2 2 2 2 2 2 2 2 2 ...
##  $ Imaging Date     : num [1:177] 190516 190516 190516 190516 190516 ...
# Further filter the information by only using data from worm strains N2 and JU1400
str(filter(infection_meta.tbl,
           `Spores(M)/cm2` > 0,
           `Spores(M)/cm2` <= 0.5,
           Worm_strain %in% c("N2", "JU1400")),
    give.attr = FALSE)
## spc_tbl_ [110 x 29] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ experiment       : chr [1:110] "190426_N2_LUAm1_10M_72hpi" "190426_JU1400_LUAm1_10M_72hpi" "190426_N2_ERTm5_1.75M_72hpi" "190426_N2_ERTm5_3.5M_72hpi" ...
##  $ experimenter     : chr [1:110] "CM" "CM" "CM" "CM" ...
##  $ description      : chr [1:110] "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" ...
##  $ Infection Date   : num [1:110] 190423 190423 190423 190423 190423 ...
##  $ Plate Number     : num [1:110] 5 26 38 39 47 48 56 57 13 15 ...
##  $ Worm_strain      : chr [1:110] "N2" "JU1400" "N2" "N2" ...
##  $ Total Worms      : num [1:110] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
##  $ Spore Strain     : chr [1:110] "LUAm1" "LUAm1" "ERTm5" "ERTm5" ...
##  $ Spore Lot        : chr [1:110] "2A" "2A" "2" "2" ...
##  $ Lot concentration: num [1:110] 176000 176000 427000 427000 427000 427000 240000 240000 176000 176000 ...
##  $ Total Spores (M) : num [1:110] 10 10 1.75 3.5 1.75 3.5 0.5 1.5 1 1 ...
##  $ Total ul spore   : num [1:110] 56.8 56.8 4.1 8.2 4.1 ...
##  $ Infection Round  : num [1:110] 1 1 1 1 1 1 1 1 1 1 ...
##  $ 40X OP50 (mL)    : num [1:110] 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 ...
##  $ Plate Size       : num [1:110] 6 6 6 6 6 6 6 6 6 6 ...
##  $ Spores(M)/cm2    : num [1:110] 0.3539 0.3539 0.0619 0.1238 0.0619 ...
##  $ Time plated      : num [1:110] 1300 1300 1300 1300 1300 1300 1300 1300 NA NA ...
##  $ Time Incubated   : num [1:110] 1600 1600 1600 1600 1600 1600 1600 1600 NA NA ...
##  $ Temp             : num [1:110] 21 21 21 21 21 21 21 21 21 21 ...
##  $ timepoint        : chr [1:110] "72" "72" "72" "72" ...
##  $ infection.type   : chr [1:110] "continuous" "continuous" "continuous" "continuous" ...
##  $ Fixing Date      : num [1:110] 190426 190426 190426 190426 190426 ...
##  $ Location         : chr [1:110] "Sample exhausted" "Sample exhausted" "None" "None" ...
##  $ Staining Date    : num [1:110] 190513 190513 190529 190529 190529 ...
##  $ Stain type       : chr [1:110] "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "microb FISH + DY96" "microb FISH + DY96" ...
##  $ Slide date       : num [1:110] 190515 190515 190530 190530 190530 ...
##  $ Slide number     : num [1:110] 5 26 38 39 47 48 56 57 1 3 ...
##  $ Slide Box        : num [1:110] 2 2 3 3 3 3 2 2 3 3 ...
##  $ Imaging Date     : num [1:110] 190516 190516 201026 201026 201026 ...
# What if we wanted to view strains that are NOT N2 or JU1400?
str(filter(infection_meta.tbl,
           `Spores(M)/cm2` > 0,
           `Spores(M)/cm2` <= 0.5,
           !Worm_strain %in% c("N2", "JU1400")), 
    give.attr = FALSE)
## spc_tbl_ [67 x 29] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ experiment       : chr [1:67] "190426_VC20019_LUAm1_10M_72hpi" "190426_AB1_LUAm1_10M_72hpi" "190426_JU397_LUAm1_10M_72hpi" "190426_JU642_LUAm1_10M_72hpi" ...
##  $ experimenter     : chr [1:67] "CM" "CM" "CM" "CM" ...
##  $ description      : chr [1:67] "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" ...
##  $ Infection Date   : num [1:67] 190423 190423 190423 190423 190423 ...
##  $ Plate Number     : num [1:67] 2 8 11 14 17 20 23 29 32 35 ...
##  $ Worm_strain      : chr [1:67] "VC20019" "AB1" "JU397" "JU642" ...
##  $ Total Worms      : num [1:67] 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
##  $ Spore Strain     : chr [1:67] "LUAm1" "LUAm1" "LUAm1" "LUAm1" ...
##  $ Spore Lot        : chr [1:67] "2A" "2A" "2A" "2A" ...
##  $ Lot concentration: num [1:67] 176000 176000 176000 176000 176000 176000 176000 176000 176000 427000 ...
##  $ Total Spores (M) : num [1:67] 10 10 10 10 10 10 10 10 10 1.75 ...
##  $ Total ul spore   : num [1:67] 56.8 56.8 56.8 56.8 56.8 ...
##  $ Infection Round  : num [1:67] 1 1 1 1 1 1 1 1 1 1 ...
##  $ 40X OP50 (mL)    : num [1:67] 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 ...
##  $ Plate Size       : num [1:67] 6 6 6 6 6 6 6 6 6 6 ...
##  $ Spores(M)/cm2    : num [1:67] 0.354 0.354 0.354 0.354 0.354 ...
##  $ Time plated      : num [1:67] 1300 1300 1300 1300 1300 1300 1300 1300 1300 1300 ...
##  $ Time Incubated   : num [1:67] 1600 1600 1600 1600 1600 1600 1600 1600 1600 1600 ...
##  $ Temp             : num [1:67] 21 21 21 21 21 21 21 21 21 21 ...
##  $ timepoint        : chr [1:67] "72" "72" "72" "72" ...
##  $ infection.type   : chr [1:67] "continuous" "continuous" "continuous" "continuous" ...
##  $ Fixing Date      : num [1:67] 190426 190426 190426 190426 190426 ...
##  $ Location         : chr [1:67] "Sample exhausted" "Sample exhausted" "Sample exhausted" "Sample exhausted" ...
##  $ Staining Date    : num [1:67] 190513 190513 190513 190513 190513 ...
##  $ Stain type       : chr [1:67] "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" ...
##  $ Slide date       : num [1:67] 190515 190515 190515 190515 190515 ...
##  $ Slide number     : num [1:67] 2 8 11 14 17 20 23 29 32 35 ...
##  $ Slide Box        : num [1:67] 2 2 2 2 2 2 2 2 2 2 ...
##  $ Imaging Date     : num [1:67] 190516 190516 190516 190516 190516 ...
# Be careful how you filter your data! If none of the rows meet your criteria, it can return an empty tibble!
# Query experiments for any instances of Spores(M)/cm2 < 0.
str(filter(infection_meta.tbl,
           `Spores(M)/cm2` < 0),
    give.attr = FALSE)
## spc_tbl_ [0 x 29] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ experiment       : chr(0) 
##  $ experimenter     : chr(0) 
##  $ description      : chr(0) 
##  $ Infection Date   : num(0) 
##  $ Plate Number     : num(0) 
##  $ Worm_strain      : chr(0) 
##  $ Total Worms      : num(0) 
##  $ Spore Strain     : chr(0) 
##  $ Spore Lot        : chr(0) 
##  $ Lot concentration: num(0) 
##  $ Total Spores (M) : num(0) 
##  $ Total ul spore   : num(0) 
##  $ Infection Round  : num(0) 
##  $ 40X OP50 (mL)    : num(0) 
##  $ Plate Size       : num(0) 
##  $ Spores(M)/cm2    : num(0) 
##  $ Time plated      : num(0) 
##  $ Time Incubated   : num(0) 
##  $ Temp             : num(0) 
##  $ timepoint        : chr(0) 
##  $ infection.type   : chr(0) 
##  $ Fixing Date      : num(0) 
##  $ Location         : chr(0) 
##  $ Staining Date    : num(0) 
##  $ Stain type       : chr(0) 
##  $ Slide date       : num(0) 
##  $ Slide number     : num(0) 
##  $ Slide Box        : num(0) 
##  $ Imaging Date     : num(0)

4.2.5 Regular expressions can also be used to filter through your data

A powerful set of functions called regular expressions (regex) can also be used for partial character matching. Regex is found in any programming language, not only in R, so familiarizing yourself with regex is a must as a programmer.

We will spend a large chunk of Lecture 05 discussing regular expressions. Until then, just remember that you can use them as part of your filtering process. Below you’ll find some useful functions that can help you accomplish this.

# More about regex
?regex()

# search for matches to argument pattern
?grep()
?grepl()
?regexpr()
?gregexpr()
?regexec()

# perform replacement of the first and all matches respectively.
?sub()
?gsub()

4.3.0 Use select() to subset and order columns in your data frame

Often times we don’t want all of the columns in our dataframe. You can subset or remove columns by using the select() function. You can also reorder columns using this function. Essentially this is a great way to move columns around your data frame or as a way to select() for the data columns you want in your data frame.

The select() function takes the format of select(data, ...) where

  • data is your data.frame or tibble object.
  • ... is a comma-separated list of column names from data based on a concise mini-language used in the tidyverse.

While there are many ways to select your columns in this function, we’ll cover a handful of the more common ways.

Suppose I want to look only at some of the experimental information, including the various infection/fixing/imaging dates as well as the worm and spore strain information.

# We just want to know information related to strain names, spore info and dates

head(select(infection_meta.tbl, 
            experiment, Worm_strain, 
            `Spore Strain`, `Spore Lot`, `Total Spores (M)`, 
            `Infection Date`, `Fixing Date`, `Imaging Date` ))
## # A tibble: 6 x 8
##   experiment           Worm_strain `Spore Strain` `Spore Lot` `Total Spores (M)`
##   <chr>                <chr>       <chr>          <chr>                    <dbl>
## 1 190426_VC20019_LUAm~ VC20019     LUAm1          2A                           0
## 2 190426_VC20019_LUAm~ VC20019     LUAm1          2A                          10
## 3 190426_VC20019_LUAm~ VC20019     LUAm1          2A                          20
## 4 190426_N2_LUAm1_0M_~ N2          LUAm1          2A                           0
## 5 190426_N2_LUAm1_10M~ N2          LUAm1          2A                          10
## 6 190426_N2_LUAm1_20M~ N2          LUAm1          2A                          20
## # i 3 more variables: `Infection Date` <dbl>, `Fixing Date` <dbl>,
## #   `Imaging Date` <dbl>

4.3.1 Use starts_with() and ends_with() helper functions to specify elements from a vector

dplyr also includes some helper functions that allow you to select variables (columns) based on their names. For example, we have both Spore Strain and Spore Lot columns. We can shortcut both of those using the starts_with() helper function. Likewise we can select all of our “X Date” columns using the ends_with() function.

# Select for columns starting with the word "Spore" or ending with "Date"

head(select(infection_meta.tbl, 
            experiment, Worm_strain, 
            starts_with("Spore", ignore.case = FALSE), `Total Spores (M)`, 
            ends_with("Date", ignore.case = FALSE)))
## # A tibble: 6 x 10
##   experiment              Worm_strain `Spore Strain` `Spore Lot` `Spores(M)/cm2`
##   <chr>                   <chr>       <chr>          <chr>                 <dbl>
## 1 190426_VC20019_LUAm1_0~ VC20019     LUAm1          2A                    0    
## 2 190426_VC20019_LUAm1_1~ VC20019     LUAm1          2A                    0.354
## 3 190426_VC20019_LUAm1_2~ VC20019     LUAm1          2A                    0.708
## 4 190426_N2_LUAm1_0M_72h~ N2          LUAm1          2A                    0    
## 5 190426_N2_LUAm1_10M_72~ N2          LUAm1          2A                    0.354
## 6 190426_N2_LUAm1_20M_72~ N2          LUAm1          2A                    0.708
## # i 5 more variables: `Total Spores (M)` <dbl>, `Infection Date` <dbl>,
## #   `Fixing Date` <dbl>, `Staining Date` <dbl>, `Imaging Date` <dbl>

As you can see from above, by using these helper verbs, we were able to pick up some extra columns that we’d “forgotten” about that would have some helpful information. We also reduced the amount of coding we had to generate and reduced our chances of errors due to spelling or typos!


4.3.2 Use the contains() helper function to select text within column names!

Now that we can base our column selection on the start or end of column names, we can also use the occurrence of words or patterns within as well. Let’s do one last example and simplify how we select column names like Spore Strain and Total Spores (M).

A note about helper functions! The only caveat to mention in our quest to simplify selecting columns, we don’t have as much control over the specific selection order. Within these helper functions, the resulting selections are ordered based on their relative placement within the data frame or tibble.

# Simplify our previous column selections using the contains() helper
# Save the result as meta_trimmed.tbl

meta_trimmed.tbl <- select(infection_meta.tbl, 
                           experiment, Worm_strain, 
                           contains("Spore", ignore.case = FALSE),
                           ends_with("Date", ignore.case = FALSE))

# Take a look at the resulting table
head(meta_trimmed.tbl)
## # A tibble: 6 x 10
##   experiment           Worm_strain `Spore Strain` `Spore Lot` `Total Spores (M)`
##   <chr>                <chr>       <chr>          <chr>                    <dbl>
## 1 190426_VC20019_LUAm~ VC20019     LUAm1          2A                           0
## 2 190426_VC20019_LUAm~ VC20019     LUAm1          2A                          10
## 3 190426_VC20019_LUAm~ VC20019     LUAm1          2A                          20
## 4 190426_N2_LUAm1_0M_~ N2          LUAm1          2A                           0
## 5 190426_N2_LUAm1_10M~ N2          LUAm1          2A                          10
## 6 190426_N2_LUAm1_20M~ N2          LUAm1          2A                          20
## # i 5 more variables: `Spores(M)/cm2` <dbl>, `Infection Date` <dbl>,
## #   `Fixing Date` <dbl>, `Staining Date` <dbl>, `Imaging Date` <dbl>

4.4.0 Sort your data with arrange()

The arrange(data, ...) function helps you to sort your data. The default behaviour is to order results from smallest to largest (or a-z for character data). You can switch the order by specifying desc() (descending) as shown below. You can think of this like sorting in Excel and you can sort by giving precedence to multiple columns using a , to separate each. Rows will be ordered based on the order of each column name submitted.

Let’s sort the meta_trimmed.tbl that we’ve generated in previous steps.

# Arrange the trimmed metadata in descending order of Total Spores
desc_totalSpores <- arrange(meta_trimmed.tbl, desc(`Total Spores (M)`))

# Take a look at the sorted data
head(desc_totalSpores)
## # A tibble: 6 x 10
##   experiment           Worm_strain `Spore Strain` `Spore Lot` `Total Spores (M)`
##   <chr>                <chr>       <chr>          <chr>                    <dbl>
## 1 200824_N2-rep1_LUAm~ N2-rep1     LUAm1          2A                          40
## 2 200824_JU1400-rep1_~ JU1400-rep1 LUAm1          2A                          40
## 3 200821_N2_LUAm1_30M~ N2          LUAm1          2A                          30
## 4 200821_JU1400_LUAm1~ JU1400      LUAm1          2A                          30
## 5 190426_VC20019_LUAm~ VC20019     LUAm1          2A                          20
## 6 190426_N2_LUAm1_20M~ N2          LUAm1          2A                          20
## # i 5 more variables: `Spores(M)/cm2` <dbl>, `Infection Date` <dbl>,
## #   `Fixing Date` <dbl>, `Staining Date` <dbl>, `Imaging Date` <dbl>

4.5.0 Multiple steps can begin to pile up!

Suppose we want to look at the sorted data above looking only at infection experiments with > 0M spores in samples using only N2 and JU1400 worms infected by LUAm1 and ERTm5 microsporidia. Arrange these by infection date in ascending order.

How would you do it? How many experimental conditions are there that meet our criteria?

# Filter the data by Total spores, worm strains, and microsporidia strains
# Extra var 1
desc_totalSpores_filtered <- filter(desc_totalSpores,
                                    `Total Spores (M)` > 0,
                                    Worm_strain %in% c("N2", "JU1400"),
                                    `Spore Strain` %in% c("LUAm1", "ERTm5")) 

# Sort the data by Infection Date
# Extra var 2
desc_totalSpores_filtered_asc <- arrange(..., `Infection Date`) 
## Error in eval(expr, envir, enclos): '...' used in an incorrect context
# Retrieve the experiment names
# Extra var 3
select_experiments <- select(..., experiment) 
## Error in eval(expr, envir, enclos): '...' used in an incorrect context
# How many observations (rows) are there?
nrow(select_experiments)
## Error in nrow(select_experiments): object 'select_experiments' not found

4.5.1 Pass along function output to new functions using the piping symbol %>%

While the above code answered the question, it also created a series of intermediate variables that we aren’t interested in. These ‘intermediate variables’ were used to store data that was passed as input to the next function. You’ll notice that we didn’t need them for anything else in the code! If we aren’t careful this will quickly clutter our global environment (and memory!) - which kind of keeps track of these things for the kernel. Instead, we can use a more “natural flow” of data to produce our code.

The dplyr package, and some other common packages for data frame manipulation in the tidyverse allow the use of the pipe function, %>%. This is equivalent to | for any UNIX aficionados. Piping allows the output of one function to be passed along to the next function without creating intermediate variables. Piping can save typing, make your code more readable, and reduce clutter in your global environment from variables you don’t need. The keyboard shortcut for %>% is CTRL+SHIFT+M.

In essence the %>% pipe takes output from the left-hand side and passes it as input to the right-hand side. As an example we’ll look at how pipes work in conjunction with the function filter(), and then see the benefits to simplifying the code that we just wrote.

# Remember the R evaluates () from the inner to outer
head(filter(meta_trimmed.tbl, `Total Spores (M)` > 0))
## # A tibble: 6 x 10
##   experiment           Worm_strain `Spore Strain` `Spore Lot` `Total Spores (M)`
##   <chr>                <chr>       <chr>          <chr>                    <dbl>
## 1 190426_VC20019_LUAm~ VC20019     LUAm1          2A                          10
## 2 190426_VC20019_LUAm~ VC20019     LUAm1          2A                          20
## 3 190426_N2_LUAm1_10M~ N2          LUAm1          2A                          10
## 4 190426_N2_LUAm1_20M~ N2          LUAm1          2A                          20
## 5 190426_AB1_LUAm1_10~ AB1         LUAm1          2A                          10
## 6 190426_AB1_LUAm1_20~ AB1         LUAm1          2A                          20
## # i 5 more variables: `Spores(M)/cm2` <dbl>, `Infection Date` <dbl>,
## #   `Fixing Date` <dbl>, `Staining Date` <dbl>, `Imaging Date` <dbl>
# Break the nested functions into their order of execution
meta_trimmed.tbl ... filter(`Total Spores (M)` > 0) ... head()
## Error: <text>:2:18: unexpected symbol
## 1: # Break the nested functions into their order of execution
## 2: meta_trimmed.tbl ...
##                     ^
# You can separate one or more functions in the pipeline
meta_trimmed.tbl %>%
    # Notice the "." in the first position of filter - this is where data normally is assigned as a parameter
    filter(..., `Total Spores (M)` > 0) %>% 
    # Pass the filtered data to the head() function
    head()
## Error in head(.): '...' used in an incorrect context

4.5.2 Using . with %>% denotes the object produced by the last called function

You’ll notice that when piping, we are not explicitly writing the first argument (our data frame) to filter(), but rather passing the first argument to filter using %>%. The dot . is sometimes used to fill in the first, or a later argument as a placeholder. This notation is useful for nested functions (functions inside functions) within our piping, which we will come across a bit later.

What would working with pipes look like for our more complex example? Recall we want to filter for infection experiments with > 0M spores in samples using only N2 and JU1400 worms infected by LUAm1 and ERTm5 microsporidia. Arrange these by infection date in ascending order and display the first 20 entries.

# 1. Filter the data
# 2. Arrange the result
# 3. Grab the experiment
# 4. Print the first 20 entries
meta_trimmed.tbl %>% 
  filter(`Total Spores (M)` > 0, Worm_strain %in% c("N2", "JU1400"),`Spore Strain` %in% c("LUAm1", "ERTm5")) %>% 
  arrange(`Infection Date`) %>% 
  select(experiment) %>% 
  head(20)  
## # A tibble: 20 x 1
##    experiment                     
##    <chr>                          
##  1 190426_N2_LUAm1_10M_72hpi      
##  2 190426_N2_LUAm1_20M_72hpi      
##  3 190426_JU1400_LUAm1_10M_72hpi  
##  4 190426_JU1400_LUAm1_20M_72hpi  
##  5 190426_N2_ERTm5_1.75M_72hpi    
##  6 190426_N2_ERTm5_3.5M_72hpi     
##  7 190426_JU1400_ERTm5_1.75M_72hpi
##  8 190426_JU1400_ERTm5_3.5M_72hpi 
##  9 190605_N2_LUAm1_1M_24hpi       
## 10 190605_JU1400_LUAm1_1M_24hpi   
## 11 190606_N2_LUAm1_1M_48hpi       
## 12 190606_JU1400_LUAm1_1M_48hpi   
## 13 190607_N2_LUAm1_1M_72hpi       
## 14 190607_JU1400_LUAm1_1M_72hpi   
## 15 190611_N2_LUAm1_6M_0.5hpi      
## 16 190611_JU1400_LUAm1_6M_0.5hpi  
## 17 190611_N2_LUAm1_6M_3hpi        
## 18 190611_JU1400_LUAm1_6M_3hpi    
## 19 190808_N2_LUAm1_5M_24hpi       
## 20 190808_JU1400_LUAm1_5M_24hpi

4.5.3 Use spacing and new lines to keep track of your directing

When using more than 2 pipes %>% it gets hard to follow for a reader (or yourself). Starting a new line after each pipe, allows a reader to easily see which function is operating and makes it easier to follow your logic. Using pipes also has the benefit that extra intermediate variables do not need to be created, freeing up your global environment for objects you are interested in keeping.

For this example we’ve tab-indented subsequent commands and parameters in the pipeline to additionally separate things visually.

# Pass our data.frame 
meta_trimmed.tbl %>% 

  # 1. Filter the data
  filter(`Total Spores (M)` > 0,                       # > 0 spores per infection
         Worm_strain %in% c("N2", "JU1400"),           # Only N2 and JU1400 animals
         `Spore Strain` %in% c("LUAm1", "ERTm5")) %>%  # Only LUAm1 and ERTm5 infections
  
  # 2. Arrange the result
  arrange(`Infection Date`) %>% 
  
  # 3. Grab the experiment column
  select(experiment) %>% 
  
  # 4. Print the first 20 entries
    head(20)  
## # A tibble: 20 x 1
##    experiment                     
##    <chr>                          
##  1 190426_N2_LUAm1_10M_72hpi      
##  2 190426_N2_LUAm1_20M_72hpi      
##  3 190426_JU1400_LUAm1_10M_72hpi  
##  4 190426_JU1400_LUAm1_20M_72hpi  
##  5 190426_N2_ERTm5_1.75M_72hpi    
##  6 190426_N2_ERTm5_3.5M_72hpi     
##  7 190426_JU1400_ERTm5_1.75M_72hpi
##  8 190426_JU1400_ERTm5_3.5M_72hpi 
##  9 190605_N2_LUAm1_1M_24hpi       
## 10 190605_JU1400_LUAm1_1M_24hpi   
## 11 190606_N2_LUAm1_1M_48hpi       
## 12 190606_JU1400_LUAm1_1M_48hpi   
## 13 190607_N2_LUAm1_1M_72hpi       
## 14 190607_JU1400_LUAm1_1M_72hpi   
## 15 190611_N2_LUAm1_6M_0.5hpi      
## 16 190611_JU1400_LUAm1_6M_0.5hpi  
## 17 190611_N2_LUAm1_6M_3hpi        
## 18 190611_JU1400_LUAm1_6M_3hpi    
## 19 190808_N2_LUAm1_5M_24hpi       
## 20 190808_JU1400_LUAm1_5M_24hpi

4.6.0 Retrieve quick summaries of your data with summarise()

We can use summarise(data, ...) to define and retrieve summarised information about our dataset in a simplified way. This essentially creates a new data.frame object summarizing our observations based on the functions supplied. Multiple functions and their results can be placed into new columns we name. This is essentially the same as running the apply() function on specific columns except you can choose the columns and how they are analysed!

Let’s generate some values based on the Total Spores (M) column of meta_trimmed.tbl.

# Summarise abundance for mean and standard deviation of all rows combined
summarise(meta_trimmed.tbl, 
          ...)
## Error in eval(expr, envir, enclos): '...' used in an incorrect context

Don’t forget about NA values! Remember that a number of functions can be told to ignore NA values when calculating their products. You’ll have to check their parameter information to be sure. For instance using ?mean to check if it can ignore NA values.


4.7.0 Use group_by() to reorder data based on variable categories

Does the summary from above really make sense? Not exactly. We are looking at Total Spores (M) but there are many different microsporidia strains being tested across different conditions (ie worm strains). We should take more variables into consideration. First, let’s summarise by Spore Strain using group_by() along with summarise().

The function group_by() produces a grouped data.frame object which behaves mostly like a standard data.frame but also has meta information about the grouping you’ve specified. You can group by a single variable (column) or multiple ones to produce multi-layered groupings. This underlying meta grouping can be recognized by other dplyr methods such as summarise()!

# Pass along trimmed data
meta_trimmed.tbl %>% 

    # group by Spore strain
    ... %>% 
    
    # Look at the first 10 rows
    head(., 10)
## Error in ...(.): could not find function "..."

Doesn’t look very different from a regular data.frame does it? What if we try to summarise() with it?


# Pass along trimmed data
meta_trimmed.tbl %>% 

    # group by Spore strain
    group_by(., `Spore Strain`) %>% 
    
    # Summarise the data now
    ...(., 
              totalSpores_sum = sum(`Total Spores (M)`),
              totalSpores_mean = mean(`Total Spores (M)`),
              totalSpores_sd = sd(`Total Spores (M)`))
## Error in ...(., totalSpores_sum = sum(`Total Spores (M)`), totalSpores_mean = mean(`Total Spores (M)`), : could not find function "..."

Notice that the summarise() created a new tibble and it has the columns totalSpores_sum, totalSpores_mean and totalSpores_sd. You can actually name these columns whatever you want as you generate the code.

We also see the column, Spore Strain that we used to in group_by() command. Any columns used in that command will also be included since they are the foundation of the summarise() call.

# Here's the equivalent code without piping
summarise(group_by(meta_trimmed.tbl, `Spore Strain`), 
          totalSpores_sum = sum(`Total Spores (M)`),
          totalSpores_mean = mean(`Total Spores (M)`),
          totalSpores_sd = sd(`Total Spores (M)`))
## # A tibble: 10 x 4
##    `Spore Strain` totalSpores_sum totalSpores_mean totalSpores_sd
##    <chr>                    <dbl>            <dbl>          <dbl>
##  1 AWRm78                     21             3.5            0    
##  2 ERTm2                       8             0.667          0.651
##  3 ERTm5                     232.            2.76           2.52 
##  4 ERTm5-96H                  21             1.75           1.49 
##  5 LUAm1                     923             6.79           8.05 
##  6 LUAm1-HK                   20            10              0    
##  7 LUAm1-pel                  20            10              0    
##  8 LUAm1-sup                  20            10              0    
##  9 LUAm3                      60            10              0    
## 10 MAM1                      104             7.43           3.46

Which option looks more “readable” to you? Piping or nesting functions?


4.8.0 Use mutate() to create new columns in your data frame

Speaking about creating columns, let’s explore the mutate() function. mutate() is a function to create new columns, most often the product of a calculation or concatenation of information. For example, let’s concatenate names from some of the columns by putting Spore Strain and Spore Lot columns together with the paste() function. We can keep the result in a new column, spore_strain_lot.

# Start with our tibble
meta_trimmed.tbl %>% 

    # Use the mutate command to paste two set of column information together
    mutate(spore_strain_lot = ...(`Spore Strain`, `Spore Lot`, sep = "_")) %>% 
    
    # Peek at the result.
    head()
## Error in `mutate()`:
## i In argument: `spore_strain_lot = ...(`Spore Strain`, `Spore Lot`, sep = "_")`.
## Caused by error in `...()`:
## ! could not find function "..."

4.8.0.1 Piping will not automatically save your output!

Up to this point we’ve been doing a lot of piping with %>% and we can see the results in the output of our code but we have NOT been saving the results to a variable. This has two consequences:

  1. We can query, alter, and summarise our data without accidentally changing our original data.
  2. Any data structures we make are not permanent and do not exist in memory after we are done.

If you want to save your data - perhaps after figuring out the series of steps you want to implement - you need to assign it to a variable or at least pipe it to a write*() function to save on disk.

Unlike the mutate() command, we can also directly and permanently alter our data structure by adding in new columns. New columns can be easily created using the $col_name syntax. If the column does not already exist, it will be created. Otherwise its data will be overwritten.

# adding columns can also be done using "base R" code:
# This will permanently change meta_trimmed.tbl
meta_trimmed.tbl$... <- paste(meta_trimmed.tbl$`Spore Strain`, 
                                           meta_trimmed.tbl$`Spore Lot`, 
                                           sep = "_")

head(meta_trimmed.tbl)
## # A tibble: 6 x 11
##   experiment           Worm_strain `Spore Strain` `Spore Lot` `Total Spores (M)`
##   <chr>                <chr>       <chr>          <chr>                    <dbl>
## 1 190426_VC20019_LUAm~ VC20019     LUAm1          2A                           0
## 2 190426_VC20019_LUAm~ VC20019     LUAm1          2A                          10
## 3 190426_VC20019_LUAm~ VC20019     LUAm1          2A                          20
## 4 190426_N2_LUAm1_0M_~ N2          LUAm1          2A                           0
## 5 190426_N2_LUAm1_10M~ N2          LUAm1          2A                          10
## 6 190426_N2_LUAm1_20M~ N2          LUAm1          2A                          20
## # i 6 more variables: `Spores(M)/cm2` <dbl>, `Infection Date` <dbl>,
## #   `Fixing Date` <dbl>, `Staining Date` <dbl>, `Imaging Date` <dbl>, ... <chr>

4.8.1 Use select() to remove columns

We previously saw how to use select() to get a subgroup of columns we want, but we can also use it to “remove” columns. Note how our last call made a permanent change to meta_trimmed.tbl. To exclude the variable spore_strain_lot from meta_trimmed.tbl, we can use select(), then overwrite meta_trimmed.tbl. Simply add a - (minus) in front of spore_strain_lot.

# Check the column names before and after removing `compound_salinity`
colnames(meta_trimmed.tbl)
##  [1] "experiment"       "Worm_strain"      "Spore Strain"     "Spore Lot"       
##  [5] "Total Spores (M)" "Spores(M)/cm2"    "Infection Date"   "Fixing Date"     
##  [9] "Staining Date"    "Imaging Date"     "..."
meta_trimmed.tbl <- select(meta_trimmed.tbl, ...) # remove column spore_strain_lot
## Error in eval(expr, envir, enclos): '...' used in an incorrect context
head(meta_trimmed.tbl)
## # A tibble: 6 x 11
##   experiment           Worm_strain `Spore Strain` `Spore Lot` `Total Spores (M)`
##   <chr>                <chr>       <chr>          <chr>                    <dbl>
## 1 190426_VC20019_LUAm~ VC20019     LUAm1          2A                           0
## 2 190426_VC20019_LUAm~ VC20019     LUAm1          2A                          10
## 3 190426_VC20019_LUAm~ VC20019     LUAm1          2A                          20
## 4 190426_N2_LUAm1_0M_~ N2          LUAm1          2A                           0
## 5 190426_N2_LUAm1_10M~ N2          LUAm1          2A                          10
## 6 190426_N2_LUAm1_20M~ N2          LUAm1          2A                          20
## # i 6 more variables: `Spores(M)/cm2` <dbl>, `Infection Date` <dbl>,
## #   `Fixing Date` <dbl>, `Staining Date` <dbl>, `Imaging Date` <dbl>, ... <chr>

4.9.0 Use transmute() to create a new data.frame

transmute() is along the same veins of mutate() as it will also create a new column (variable). However, it will drop the existing columns and give you a single column for each new one specified. The output for transmute() is a tibble of your new variable(s).

meta_trimmed.tbl %>% 

    # Transmute some new columns
    ...(spore_strain_lot = paste(`Spore Strain`, `Spore Lot`, sep = "_")) %>% 
    
    # look at the unique combinations
    unique()
## Error in ...(., spore_strain_lot = paste(`Spore Strain`, `Spore Lot`, : could not find function "..."

Notice that you can accomplish a lot of similar functions as summarise() using transmute() but there are some small differences when it comes to the context of a grouped_dataframe.

It is up to you whether you want to keep your data in a tibble or switch to a vector if you are dealing with a single column of data (aka variable). Using a dplyr function will maintain your data in a tibble structure. Using non-dplyr functions will switch your data to a vector if you have a 1-dimensional output.


Comprehension Question 4.0.0: Using our table meta_trimmed.tbl determine how many different combinations of C. elegans and microsporidia strains were tested (regardless of dosage or other factors). What are the top 10 most common combinations? Hint: use the group_by() %>% summarise() paradigm and check out the n() function.

# comprehension answer code 4.0.0
meta_trimmed.tbl %>% 
... %>% 
... %>% 
... %>% 
...
## Error in eval(expr, envir, enclos): '...' used in an incorrect context

5.0.0 Writing data files

You’ve gone through all that trouble of learning how to import, filter, slice, and sort our datasets. Now comes the time to make sure that work doesn’t go to waste. During larger scripts, there may be intermediate files you want to save just in case an error occurs further along. It can also give you a sense of how things are progressing. Whether it is an intermediate or final dataset that you would like to keep, it’s time to learn how to save your files.

5.1.0 Save your data frame to file with write_csv()

We’re ready to write meta_trimmed.tbl or any other data frame for that matter. In this case we won’t overwrite our old data set but rather just create a second version of it.

Note that there are many ways to write data frames to files, including writing back to excel files! First we’ll keep it simple and within the tidyverse with write_csv() which is a derivative of the write_delim() function. The write_csv() function includes some of the following parameters:

  • x: the data structure you’d like to write to file - preferably a tibble or data.frame.
  • file: the file path where you are sending the output.
  • na: a character string used for NA values - defaults to “NA”.
  • append: logical argument with FALSE as default (overwrites an existing file) or TRUE will append to an existing file. If the file doesn’t exist in either case, it writes to a new file.
  • col_names: logical argument to include the column names as part of the file. If unspecified, it will take the opposite value of append.
getwd()
## [1] "C:/Users/mokca/Dropbox/!CAGEF/Course_Materials/Introduction_to_R/2025.09_Intro_to_R/lecture_02_introduction_to_dplyr"
# Write our data to file
...(x = meta_trimmed.tbl,
          file = "data/infection_metadata_trimmed.csv",
          col_names=TRUE)
## Error in ...(x = meta_trimmed.tbl, file = "data/infection_metadata_trimmed.csv", : could not find function "..."

5.1.1 Use %>% to direct your output to write_csv()

That’s right, you can pipe your data from filtering etc., over to write_csv(). While you may think this is usually the last step in your pipeline, it will actually write the data to file and then pass the input forward through the next pipe.

This has two implications:

  1. Yes you can use piping and still save your data mid-analysis.
  2. You can assign the final output of your piping to a variable.

Let’s revisit our last summarizing pipeline.

... <-
    # Pass along trimmed data
    meta_trimmed.tbl %>% 
    
    # group by Spore strain
    group_by(., `Spore Strain`) %>% 
    
    # Summarise the data now
    summarise(., 
              totalSpores_sum = sum(`Total Spores (M)`),
              totalSpores_mean = mean(`Total Spores (M)`),
              totalSpores_sd = sd(`Total Spores (M)`)) %>% 
    
    # write your file to output
    write_csv(x = ., file="data/infection_metadata_summary.csv", col_names=TRUE)
## Error in `as_tibble()`:
## ! Column 11 must not have names of the form ... or ..j.
## Use `.name_repair` to specify repair.
## Caused by error in `repaired_names()`:
## ! Names can't be of the form `...` or `..j`.
## x These names are invalid:
##   * "..." at location 1.
# Take a look at the result of the pipeline
write_result
## Error in eval(expr, envir, enclos): object 'write_result' not found

5.2.0 Save your data frame to an excel file with write_xlsx()

Sometimes you may want to write multiple data frames to a single file like a xlsx format with sheets. This can be a convenient way to keep data together rather than making multiple write_csv() commands.

The writexl package contains the write_xlsx() function which can write the contents of a named list of data frames to multiple sheets. This function includes the following parameters:

  • x: a data.frame, tibble, or a named list of data frames
  • path: the path to write the .xlsx file to
  • col_names: logical parameter for whether or not to write column names at the top of each sheet

Let’s give it a try to wrap up today’s lecture!

# install.packages("writexl", dependencies = TRUE)
# library(writexl)

# Write a list to a single xlsx file
...(x = list("..." = infection_meta.tbl, "metadata_summary" = write_result),
           path = "data/metadata_analysis.xlsx",
           col_names = TRUE
          )
## Error in ...(x = list(... = infection_meta.tbl, metadata_summary = write_result), : could not find function "..."

6.0.0 Class summary

That’s a wrap for our second class on R! You’ve made it through and we’ve learned about the following:

  1. Installing and loading packages in R.
  2. Importing plain text and excel files.
  3. Functions for inspecting your data frame.
  4. Basic filtering, sorting and mutating of data frames with the dplyr package.
  5. Exporting data files

6.1.0 Submit your completed skeleton notebook (2% of final grade)

At the end of this lecture a Quercus assignment portal will be available to submit a RMD version of your completed skeletons from today (including the comprehension question answers!). These will be due one week later, before the next lecture. Each lecture skeleton is worth 2% of your final grade but a bonus 0.5% will also be awarded for submissions made within 24 hours from the end of lecture (ie 1600 hours the following day). To save your notebook:

  1. From the RStudio Notebook in the lower right pane (Files tab), select the skeleton file checkbox (left-hand side of the file name)
  2. Under the More button drop down, select the Export button and save to your hard drive.
  3. Upload your RMD file to the Quercus skeleton portal.

6.2.0 Post-lecture assessment (6% of final grade)

Soon after the end of each lecture, a homework assignment will be available for you in DataCamp. Your assignment is to complete chapters from the Data Manipulation with dplyr course: Transforming data with dplyr (900 points); Aggregating data (1050 points); and Selecting and transforming data (750 points) for a total of 2700 points. This is a pass-fail assignment, and in order to pass you need to achieve a least 2025points (75%) of the total possible points. Note that when you take hints from the DataCamp chapter, it will reduce your total earned points for that chapter.

In order to properly assess your progress on DataCamp, at the end of each chapter, please print a PDF of the summary. You can do so by following these steps:

  1. Navigate to the Learn section along the top menu bar of DataCamp. This will bring you to the various courses you have been assigned under My Assignments.
  2. Click on your completed assignment and expand each chapter of the course by clicking on the VIEW CHAPTER DETAILS link. Do this for all sections on the page!
  3. Carefully highlight/select the page starting with the course title (ie Introduction to R) and going to the end of the last section. Avoid using ctrl + A to highlight all of the visible text.
  4. Print the page from your browser menu and save as a single PDF. In the options, be sure to print “selection” or you may not be able to print the full page. It should print out something like what follows, except with more chapter info.

You may need to take several screenshots if you cannot print it all in a single try. Submit the file(s) or a combined PDF for the homework to the assignment section of Quercus. By submitting your scores for each section, and chapter, we can keep track of your progress, identify knowledge gaps, and produce a standardized way for you to check on your assignment “grades” throughout the course.

You will have until 1259 hours on Tuesday, September 16th to submit your assignment (right before the next lecture).


6.3.0 Acknowledgements

Revision 1.0.0: materials prepared in R Markdown by Oscar Montoya, M.Sc. Bioinformatician, Education and Outreach, CAGEF.

Revision 1.1.0: edited and prepared for CSB1020H F LEC0142, 09-2021 by Calvin Mok, Ph.D. Bioinformatician, Education and Outreach, CAGEF.

Revision 1.1.1: edited and prepared for CSB1020H F LEC0142, 09-2022 by Calvin Mok, Ph.D. Bioinformatician, Education and Outreach, CAGEF.

Revision 1.1.2: edited and prepared for CSB1020H F LEC0142, 09-2023 by Calvin Mok, Ph.D. Bioinformatician, Education and Outreach, CAGEF.

Revision 1.2.0: edited and prepared for CSB1020H F LEC0142, 09-2024 by Calvin Mok, Ph.D. Bioinformatician, Education and Outreach, CAGEF.

Revision 1.2.1: edited and prepared for CSB1020H F LEC0142, 09-2025 by Calvin Mok, Ph.D. Bioinformatician, Education and Outreach, CAGEF.


6.4.0 Your DataCamp academic subscription

This class is supported by DataCamp, the most intuitive learning platform for data science and analytics. Learn any time, anywhere and become an expert in R, Python, SQL, and more. DataCamp’s learn-by-doing methodology combines short expert videos and hands-on-the-keyboard exercises to help learners retain knowledge. DataCamp offers 350+ courses by expert instructors on topics such as importing data, data visualization, and machine learning. They?re constantly expanding their curriculum to keep up with the latest technology trends and to provide the best learning experience for all skill levels. Join over 6 million learners around the world and close your skills gap.

Your DataCamp academic subscription grants you free access to the DataCamp’s catalog for 6 months from the beginning of this course. You are free to look for additional tutorials and courses to help grow your skills for your data science journey. Learn more (literally!) at DataCamp.com.


7.0.0 Appendix I: Reading files in R using the base package

You may find for one reason or another that you prefer to use the base commands of R to import data. Here’s you’ll find a quick primer on using the read.csv() function.

7.1.0 tsv/csv files can be read by read.csv()

Let’s read our infection_meta.csv data file into R. While we do these exercises, we are going to become friends with the help() function. Let’s start by using the read.csv() function which is actually a simplified version of the function read.table(). Both of these functions are part of the base utils package in R, which is imported automatically. The read.csv() function has but is not limited to the following parameters:

  • file: the file name we want to import
  • header: logical parameter noting if your imported table has a header or not. Uses TRUE as the default value.
  • sep: character parameter denoting how your fields are separated. Uses , as the default value.
library(tidyverse)

# Remember the head() function? We'll import our file but just look at the first 6 rows of it
head(read.csv("data/infection_meta.csv"))
##                       experiment experimenter                  description
## 1  190426_VC20019_LUAm1_0M_72hpi           CM Wild isolate phenoMIP retest
## 2 190426_VC20019_LUAm1_10M_72hpi           CM Wild isolate phenoMIP retest
## 3 190426_VC20019_LUAm1_20M_72hpi           CM Wild isolate phenoMIP retest
## 4       190426_N2_LUAm1_0M_72hpi           CM Wild isolate phenoMIP retest
## 5      190426_N2_LUAm1_10M_72hpi           CM Wild isolate phenoMIP retest
## 6      190426_N2_LUAm1_20M_72hpi           CM Wild isolate phenoMIP retest
##   Infection.Date Plate.Number Worm_strain Total.Worms Spore.Strain Spore.Lot
## 1         190423            1     VC20019        1000        LUAm1        2A
## 2         190423            2     VC20019        1000        LUAm1        2A
## 3         190423            3     VC20019        1000        LUAm1        2A
## 4         190423            4          N2        1000        LUAm1        2A
## 5         190423            5          N2        1000        LUAm1        2A
## 6         190423            6          N2        1000        LUAm1        2A
##   Lot.concentration Total.Spores..M. Total.ul.spore Infection.Round
## 1            176000                0        0.00000               1
## 2            176000               10       56.81818               1
## 3            176000               20      113.63636               1
## 4            176000                0        0.00000               1
## 5            176000               10       56.81818               1
## 6            176000               20      113.63636               1
##   X40X.OP50..mL. Plate.Size Spores.M..cm2 Time.plated Time.Incubated Temp
## 1           0.15          6     0.0000000        1300           1600   21
## 2           0.15          6     0.3538570        1300           1600   21
## 3           0.15          6     0.7077141        1300           1600   21
## 4           0.15          6     0.0000000        1300           1600   21
## 5           0.15          6     0.3538570        1300           1600   21
## 6           0.15          6     0.7077141        1300           1600   21
##   timepoint infection.type Fixing.Date         Location Staining.Date
## 1        72     continuous      190426 Sample exhausted        190513
## 2        72     continuous      190426 Sample exhausted        190513
## 3        72     continuous      190426 Sample exhausted        190513
## 4        72     continuous      190426 Sample exhausted        190430
## 5        72     continuous      190426 Sample exhausted        190513
## 6        72     continuous      190426 Sample exhausted        190513
##         Stain.type Slide.date Slide.number Slide.Box Imaging.Date
## 1 Sp.9 FISH + DY96     190515            1         2       190516
## 2 Sp.9 FISH + DY96     190515            2         2       190516
## 3 Sp.9 FISH + DY96     190515            3         2       190516
## 4             DY96     190501            4         2       190502
## 5 Sp.9 FISH + DY96     190515            5         2       190516
## 6 Sp.9 FISH + DY96     190515            6         2       190516
# Note that unlike read_csv() the result here is strictly a dataframe
str(read.csv("data/infection_meta.csv"))
## 'data.frame':    276 obs. of  29 variables:
##  $ experiment       : chr  "190426_VC20019_LUAm1_0M_72hpi" "190426_VC20019_LUAm1_10M_72hpi" "190426_VC20019_LUAm1_20M_72hpi" "190426_N2_LUAm1_0M_72hpi" ...
##  $ experimenter     : chr  "CM" "CM" "CM" "CM" ...
##  $ description      : chr  "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" "Wild isolate phenoMIP retest" ...
##  $ Infection.Date   : int  190423 190423 190423 190423 190423 190423 190423 190423 190423 190423 ...
##  $ Plate.Number     : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Worm_strain      : chr  "VC20019" "VC20019" "VC20019" "N2" ...
##  $ Total.Worms      : int  1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
##  $ Spore.Strain     : chr  "LUAm1" "LUAm1" "LUAm1" "LUAm1" ...
##  $ Spore.Lot        : chr  "2A" "2A" "2A" "2A" ...
##  $ Lot.concentration: int  176000 176000 176000 176000 176000 176000 176000 176000 176000 176000 ...
##  $ Total.Spores..M. : num  0 10 20 0 10 20 0 10 20 0 ...
##  $ Total.ul.spore   : num  0 56.8 113.6 0 56.8 ...
##  $ Infection.Round  : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ X40X.OP50..mL.   : num  0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15 ...
##  $ Plate.Size       : int  6 6 6 6 6 6 6 6 6 6 ...
##  $ Spores.M..cm2    : num  0 0.354 0.708 0 0.354 ...
##  $ Time.plated      : int  1300 1300 1300 1300 1300 1300 1300 1300 1300 1300 ...
##  $ Time.Incubated   : int  1600 1600 1600 1600 1600 1600 1600 1600 1600 1600 ...
##  $ Temp             : int  21 21 21 21 21 21 21 21 21 21 ...
##  $ timepoint        : chr  "72" "72" "72" "72" ...
##  $ infection.type   : chr  "continuous" "continuous" "continuous" "continuous" ...
##  $ Fixing.Date      : int  190426 190426 190426 190426 190426 190426 190426 190426 190426 190426 ...
##  $ Location         : chr  "Sample exhausted" "Sample exhausted" "Sample exhausted" "Sample exhausted" ...
##  $ Staining.Date    : int  190513 190513 190513 190430 190513 190513 190430 190513 190513 190430 ...
##  $ Stain.type       : chr  "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "Sp.9 FISH + DY96" "DY96" ...
##  $ Slide.date       : int  190515 190515 190515 190501 190515 190515 190501 190515 190515 190501 ...
##  $ Slide.number     : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Slide.Box        : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ Imaging.Date     : int  190516 190516 190516 190502 190516 190516 190502 190516 190516 190502 ...

8.0.0 Appendix II: Working with NA values

In addition to the functions we discussed in class there are some additional methods for dealing with NA values that can be helpful, depending on the structure of your data.

# Set up our data structures again
na_vector <- c(5, 6, NA, 7, 7, NA)

na_vector
## [1]  5  6 NA  7  7 NA
# A data.frame with NA values
counts <- data.frame(Site1 = c(geneA = 2, geneB = 4, geneC = 12, geneD = 8),
                     Site2 = c(geneA = 15, geneB = NA, geneC = 27, geneD = 28),
                     Site3 = c(geneA = 10, geneB = 7, geneC = 13, geneD = NA))

counts
##       Site1 Site2 Site3
## geneA     2    15    10
## geneB     4    NA     7
## geneC    12    27    13
## geneD     8    28    NA

8.1.0 The na.omit() function will remove NA entries

In addition to our combination of functions from class, the na.omit() function can return an object where the NA values have been deleted in a listwise manner. This means complete cases (ie rows in a data.frame) will be removed instead. Keeping this in mind, you can also use this on a vector.

# equivalentish to our previous code our more complex code using is.na() and which() in combination
na.omit(na_vector)
## [1] 5 6 7 7
## attr(,"na.action")
## [1] 3 6
## attr(,"class")
## [1] "omit"
# But under the hood it is doing something slightly different
# see how it works on data.frames?
na.omit(counts)
##       Site1 Site2 Site3
## geneA     2    15    10
## geneC    12    27    13
# Apply the log function to non-NA observations. In this case na.omit can be useful. 
#?na.omit
apply(counts, MARGIN = 1, na.omit(log))
##           geneA    geneB    geneC    geneD
## Site1 0.6931472 1.386294 2.484907 2.079442
## Site2 2.7080502       NA 3.295837 3.332205
## Site3 2.3025851 1.945910 2.564949       NA
# Read more about apply() to learn more about why our data.frame is now transposed

8.2.0 There are similar functions to handle other types of null values

You can similarly deal with NaN’s in R. NaN’s (not a number) are NAs (not available), but NAs are not NaN’s. NaN’s appear for imaginary or complex numbers or unusual numeric values. Some packages may output NAs, NaN’s, or Inf/-Inf (can be found with is.finite()).

na_vector <- c(5, 6, NA, 7, 7, NA)
nan_vector <- c(5, 6, NaN, 7, 7, 0/0)

is.na(na_vector)
## [1] FALSE FALSE  TRUE FALSE FALSE  TRUE
is.na(nan_vector)
## [1] FALSE FALSE  TRUE FALSE FALSE  TRUE
is.na(nan_vector)
## [1] FALSE FALSE  TRUE FALSE FALSE  TRUE
is.nan(nan_vector) 
## [1] FALSE FALSE  TRUE FALSE FALSE  TRUE
# These type of operations are very useful when working with conditional statements (if else, while, etc.).